-
Notifications
You must be signed in to change notification settings - Fork 104
HTTP and Web
Tier: Intermediate
Commands covered: fetch, fetchpost
Per-command flag reference lives in
/docs/help/. This page is the workflow layer — when to reach for each command and how they compose.
Two commands for hitting HTTP APIs from CSV data, one row at a time. Both have:
- HTTP/2 with adaptive flow control for high throughput
- RFC RateLimit-aware dynamic throttling that adapts to the server's rate-limit headers
-
jaq(a jq clone) integration for extracting fields from JSON responses - Four caching options: in-memory LRU (default, 2M entries), persistent disk cache, Redis cache, or no cache
fetch does HTTP GET. fetchpost does HTTP POST with HTML form encoding (default) or a MiniJinja-templated JSON body.
For the deep-dive, see docs/Fetch.md.
| If you want to… | Use | Notes |
|---|---|---|
| Enrich each row by GET-ing a URL | fetch |
URL column or --url-template
|
| POST each row to a service (form-encoded) | fetchpost |
One column = one form field |
| POST each row as JSON via a MiniJinja template | fetchpost --payload-tpl |
Template renders any content-type |
| Extract specific fields from a JSON response | --jaq '...' |
jq syntax |
| Cache responses across runs |
--disk-cache or --redis-cache
|
Avoid re-hitting an API for known inputs |
| Throttle to a known rate | --rate-limit N |
Plus auto-throttle on RateLimit headers |
HTTP GET per row. Two input styles:
- A URL column — the value of one column is the URL to GET.
- A URL template —
--url-template 'https://api.example.com/users/{user_id}/orders'substitutes column values into a template.
By default fetch writes one minified JSON response per line (JSONL). With --new-column COL, it adds the response (or a --jaq-extracted value) as a new column to the original CSV.
Example: enrich a list of US ZIP codes with city/state (Zippopotamus API)
# data.csv has a "URL" column with values like
# https://api.zippopotam.us/us/90210
qsv fetch URL data.csv \
--new-column CityState \
--jaq '[ .places[0]."place name", .places[0]."state abbreviation" ]' \
> data_with_city_state.csvExample: URL template — fetch GitHub stargazers for any repo
# repos.csv has a "repo" column with values like "dathere/qsv"
qsv fetch --url-template 'https://api.github.com/repos/{repo}' \
--new-column stars \
--jaq '.stargazers_count' \
--http-header 'Authorization: Bearer $GITHUB_TOKEN' \
repos.csv > repos_with_stars.csvExample: fetch NOAA GHCN-Daily station data with a persistent disk cache
# stations.csv has a "url" column with NOAA station data URLs
qsv fetch url stations.csv \
--disk-cache \
--disk-cache-dir ~/.qsv-cache/noaa \
--new-column raw_data > stations_with_data.csvThe disk cache means a second run reuses the previous downloads — useful for incremental ETL.
Example: rate-limit to 10 requests/sec with auto-back-off on RateLimit headers
qsv fetch URL data.csv --rate-limit 10 --new-column response > with_response.csvExample: use Redis as the cache (shared across machines / CI runs)
qsv fetch URL data.csv \
--redis-cache \
--redis-cache-conn redis://my-redis-host:6379 \
--new-column response > out.csvExample: report mode — write a per-row HTTP status / timing TSV instead of merging into output
qsv fetch URL data.csv --report > report.tsv
# Columns: row,url,status,response_time_ms,...See also: /docs/help/fetch.md, docs/Fetch.md, jaq, Recipe: Fetch & Cache.
HTTP POST per row. Two body styles:
-
HTML form encoding (default,
Content-Type: application/x-www-form-urlencoded): each column listed in<column-list>becomes one form field. -
MiniJinja-templated payload (
--payload-tpl <file>): render any content type. Default isapplication/json; override with--content-type.
Example: POST each row to an ML inference endpoint as JSON
{# payload.j2 #}
{
"model": "classifier-v3",
"features": {
"text": {{ text | tojson }},
"country": {{ country | tojson }}
}
}qsv fetchpost https://ml.example.com/infer \
--payload-tpl payload.j2 \
--new-column prediction \
--jaq '.label' \
feedback.csv > feedback_classified.csvExample: POST form-encoded data (simpler — no template file)
qsv fetchpost https://api.example.com/submit name,email,score \
--new-column response_id \
--jaq '.id' \
responses.csv > responses_logged.csvThe columns name, email, score are sent as name=...&email=...&score=....
Example: bulk OCR on a CSV of image URLs (rate-limited + cached)
qsv fetchpost https://ocr.example.com/extract image_url \
--rate-limit 5 \
--disk-cache \
--new-column extracted_text \
--jaq '.text' \
images.csv > images_with_text.csvExample: custom content type (e.g., XML)
qsv fetchpost https://soap.legacy.example.com/api \
--payload-tpl xml_envelope.j2 \
--content-type 'application/xml' \
--new-column result \
--jaq '.SOAP-ENV:Envelope.Body.Response.Status' \
records.csvSee also: /docs/help/fetchpost.md, docs/Fetch.md, MiniJinja, template.
Both fetch and fetchpost share the same caching options. Pick based on your access pattern:
| Cache | Best for | Notes |
|---|---|---|
| In-memory LRU (default) | One-shot runs, small datasets | Lost when process exits; 2M entries by default; tune with --mem-cache-size
|
Disk (--disk-cache) |
Repeated runs against stable APIs | Stored at ~/.qsv-cache/fetch by default; configurable TTL |
Redis (--redis-cache) |
Distributed teams, CI/CD | Shared cache across machines; needs a Redis server |
No cache (--no-cache) |
Live data that must not be cached | Pricing endpoints, real-time stock data, etc. |
For details on cache invalidation, the disk-cache TTL, and Redis connection strings, see docs/Fetch.md.
- Command Reference (index)
-
docs/Fetch.md— canonical reference - Geospatial → geocode — local geocoding (no HTTP per row)
- Scripting (Luau / Python) → template — MiniJinja in non-HTTP contexts
- Cookbook → Fetch & Cache
- Stats Cache & Caching — see the wider qsv caching story
- Integrations — CKAN APIs, GitHub Actions
qsv — GitHub · Releases · Discussions · qsv pro · Try it online · Benchmarks · datHere · DeepWiki · Dual-licensed MIT / Unlicense
Edit this page: Contributing to the Wiki
Home · Why qsv? · Tier legend
- All Commands (index)
- Selection & Inspection
- Transform & Reshape
- Aggregation & Statistics
- Joins & Set Ops
- SQL & Polars
- Validation & Schema
- Conversion & I/O
- Geospatial
- HTTP & Web
- Scripting (Luau / Python)
- Indexing, Compression & Diff
- AI & Documentation