Skip to content
Joel Natividad edited this page May 13, 2026 · 2 revisions

HTTP & Web

Tier: Intermediate Commands covered: fetch, fetchpost

Per-command flag reference lives in /docs/help/. This page is the workflow layer — when to reach for each command and how they compose.

Two commands for hitting HTTP APIs from CSV data, one row at a time. Both have:

  • HTTP/2 with adaptive flow control for high throughput
  • RFC RateLimit-aware dynamic throttling that adapts to the server's rate-limit headers
  • jaq (a jq clone) integration for extracting fields from JSON responses
  • Four caching options: in-memory LRU (default, 2M entries), persistent disk cache, Redis cache, or no cache

fetch does HTTP GET. fetchpost does HTTP POST with HTML form encoding (default) or a MiniJinja-templated JSON body.

For the deep-dive, see docs/Fetch.md.

Quick decision table

If you want to… Use Notes
Enrich each row by GET-ing a URL fetch URL column or --url-template
POST each row to a service (form-encoded) fetchpost One column = one form field
POST each row as JSON via a MiniJinja template fetchpost --payload-tpl Template renders any content-type
Extract specific fields from a JSON response --jaq '...' jq syntax
Cache responses across runs --disk-cache or --redis-cache Avoid re-hitting an API for known inputs
Throttle to a known rate --rate-limit N Plus auto-throttle on RateLimit headers

fetch

HTTP GET per row. Two input styles:

  • A URL column — the value of one column is the URL to GET.
  • A URL template--url-template 'https://api.example.com/users/{user_id}/orders' substitutes column values into a template.

By default fetch writes one minified JSON response per line (JSONL). With --new-column COL, it adds the response (or a --jaq-extracted value) as a new column to the original CSV.

Example: enrich a list of US ZIP codes with city/state (Zippopotamus API)

# data.csv has a "URL" column with values like
#   https://api.zippopotam.us/us/90210
qsv fetch URL data.csv \
  --new-column CityState \
  --jaq '[ .places[0]."place name", .places[0]."state abbreviation" ]' \
  > data_with_city_state.csv

Example: URL template — fetch GitHub stargazers for any repo

# repos.csv has a "repo" column with values like "dathere/qsv"
qsv fetch --url-template 'https://api.github.com/repos/{repo}' \
  --new-column stars \
  --jaq '.stargazers_count' \
  --http-header 'Authorization: Bearer $GITHUB_TOKEN' \
  repos.csv > repos_with_stars.csv

Example: fetch NOAA GHCN-Daily station data with a persistent disk cache

# stations.csv has a "url" column with NOAA station data URLs
qsv fetch url stations.csv \
  --disk-cache \
  --disk-cache-dir ~/.qsv-cache/noaa \
  --new-column raw_data > stations_with_data.csv

The disk cache means a second run reuses the previous downloads — useful for incremental ETL.

Example: rate-limit to 10 requests/sec with auto-back-off on RateLimit headers

qsv fetch URL data.csv --rate-limit 10 --new-column response > with_response.csv

Example: use Redis as the cache (shared across machines / CI runs)

qsv fetch URL data.csv \
  --redis-cache \
  --redis-cache-conn redis://my-redis-host:6379 \
  --new-column response > out.csv

Example: report mode — write a per-row HTTP status / timing TSV instead of merging into output

qsv fetch URL data.csv --report > report.tsv
# Columns: row,url,status,response_time_ms,...

See also: /docs/help/fetch.md, docs/Fetch.md, jaq, Recipe: Fetch & Cache.

fetchpost

HTTP POST per row. Two body styles:

  • HTML form encoding (default, Content-Type: application/x-www-form-urlencoded): each column listed in <column-list> becomes one form field.
  • MiniJinja-templated payload (--payload-tpl <file>): render any content type. Default is application/json; override with --content-type.

Example: POST each row to an ML inference endpoint as JSON

{# payload.j2 #}
{
  "model": "classifier-v3",
  "features": {
    "text": {{ text | tojson }},
    "country": {{ country | tojson }}
  }
}
qsv fetchpost https://ml.example.com/infer \
  --payload-tpl payload.j2 \
  --new-column prediction \
  --jaq '.label' \
  feedback.csv > feedback_classified.csv

Example: POST form-encoded data (simpler — no template file)

qsv fetchpost https://api.example.com/submit name,email,score \
  --new-column response_id \
  --jaq '.id' \
  responses.csv > responses_logged.csv

The columns name, email, score are sent as name=...&email=...&score=....

Example: bulk OCR on a CSV of image URLs (rate-limited + cached)

qsv fetchpost https://ocr.example.com/extract image_url \
  --rate-limit 5 \
  --disk-cache \
  --new-column extracted_text \
  --jaq '.text' \
  images.csv > images_with_text.csv

Example: custom content type (e.g., XML)

qsv fetchpost https://soap.legacy.example.com/api \
  --payload-tpl xml_envelope.j2 \
  --content-type 'application/xml' \
  --new-column result \
  --jaq '.SOAP-ENV:Envelope.Body.Response.Status' \
  records.csv

See also: /docs/help/fetchpost.md, docs/Fetch.md, MiniJinja, template.

Caching strategy

Both fetch and fetchpost share the same caching options. Pick based on your access pattern:

Cache Best for Notes
In-memory LRU (default) One-shot runs, small datasets Lost when process exits; 2M entries by default; tune with --mem-cache-size
Disk (--disk-cache) Repeated runs against stable APIs Stored at ~/.qsv-cache/fetch by default; configurable TTL
Redis (--redis-cache) Distributed teams, CI/CD Shared cache across machines; needs a Redis server
No cache (--no-cache) Live data that must not be cached Pricing endpoints, real-time stock data, etc.

For details on cache invalidation, the disk-cache TTL, and Redis connection strings, see docs/Fetch.md.

See also

Clone this wiki locally