Release v0.42.0#11339
Draft
lidel wants to merge 35 commits into
Draft
Conversation
* chore: prepare master for v0.42.0-dev cycle * chore: add v0.43 changelog stub
* feat(fuse): accurate st_blocks and st_blksize Populate st_blocks from the UnixFS file size and advertise a chunk-aligned st_blksize so du, ls -s, and stat report real numbers on all three mounts. - fuse/mount/stat.go: new SizeToStatBlocks, DefaultBlksize (1 MiB), BlksizeFromChunker - fuse/readonly: fillAttr sets blocks and blksize for files, raw nodes, symlinks, directories - fuse/writable: Config.Blksize field + effectiveBlksize fallback; Dir, FileInode, and Symlink fillAttr populate stat fields - fuse/mfs, fuse/ipns: pass Import.UnixFSChunker into Config.Blksize via BlksizeFromChunker - tests: BlksizeFromChunker parser, DefaultBlksize anchor, effectiveBlksize zero-fallback, TestStatBlocks subtests for files, directories, symlinks on both mounts - docs/changelogs/v0.41.md: FUSE Mount Improvements entry * refactor(fuse): tighten st_blksize plumbing cap st_blksize at 16 MiB so a pathological `Import.UnixFSChunker` cannot push tools into multi-GiB per-read buffers, and parse the size suffix as uint64 so all valid numeric inputs clamp uniformly instead of silently falling back past uint32. normalize Blksize once in writable.NewDir so fillAttr reads Cfg.Blksize directly, dropping the per-call effectiveBlksize method. drop unreachable size-zero guard in fusetest.AssertStatBlocks. * refactor(fuse): cap st_blksize at fuse.MAX_KERNEL_WRITE Drop the arbitrary 16 MiB MaxBlksize ceiling and clamp directly to go-fuse's MAX_KERNEL_WRITE (1 MiB on Linux v4.20+). Hinting past this ceiling is wasted because the kernel splits any larger userspace read/write into MAX_KERNEL_WRITE-sized FUSE ops regardless. * fix(fuse): gate stat helpers to fuse-supported platforms stat.go imports go-fuse, which only builds on linux/darwin/freebsd. without a build tag it broke cross-compilation for openbsd. * docs(fuse): clarify st_blocks/st_blksize rationale
Reduce example-test flakiness and align with current kubo conventions. - move peer connection before ipfsA.Unixfs().Add() so bitswap's peer-connected event settles before Part IV fetches; on a slow CI runner the 1s ProvSearchDelay can expire and fail the test - use Ed25519 keys via CreateIdentity, matching `ipfs init` default - comments are concise and avoid jargon
bump go-libp2p-kad-dht to v0.39.1 to pick up the queryPeer AddrInfo.Addrs race fix from libp2p/go-libp2p-kad-dht#1244, which resolves the random daemon crashes reported in #11287 and #11116.
* chore: bump p2p-forge to v0.8.0 aligns coredns and gorilla/websocket with go-libp2p v0.48.0 / quic-go v0.59.0 already in kubo, and pulls grpc v1.79.3 to clear CVE-2026-33186 from SBOM scanners (not exploitable in p2p-forge at runtime). Closes #11283 * docs: document HTTP(S) proxy env vars cover HTTPS_PROXY, HTTP_PROXY, NO_PROXY in environment-variables.md, listing every path that honors http.ProxyFromEnvironment (RPC, update, delegated routing, bitswap HTTP, autoconf, AutoTLS/ACME, libp2p ws/wss). add a v0.41 changelog highlight pointing at the new section and noting that the websocket transport now accepts https:// proxy URLs.
* docs(server): document defaultServerFilters with RFC references Rework the `server` profile, `Addresses.NoAnnounce`, and `Swarm.AddrFilters` docs to make the default filter list scrutable, and annotate each entry in `defaultServerFilters` with its RFC origin. - profile.go: per-entry RFC inline comments on defaultServerFilters; godoc points at IANA special-purpose registries and cautions that changes here affect every server-profile user. - config.md NoAnnounce/AddrFilters: active-voice rewrites; cross-link publish-side and dial-side filters; consolidated tip pointing to the server profile section. - config.md server profile: IPv4 and IPv6 prefix tables with RFC references (multiaddr ipcidr notation); scenarios table for overriding specific entries; prose section for optional entries (IPv4 loopback, IPv6 outside 2000::/3) with trade-offs, motivated by loopback and unallocated IPv6 leaking into DHT announces since go-libp2p v0.47. * feat(server): strip loopback and non-public IPv6 from announces v0.40 switched libp2p to enumerate all interface addresses, which started leaking loopback, unallocated IPv6 space (e.g. 1e::/16), and other non-globally-reachable addresses into DHT and identify records of public IPFS nodes including bootstrap peers. adds three entries to defaultServerFilters applied to both Swarm.AddrFilters and Addresses.NoAnnounce: - /ip4/127.0.0.0/ipcidr/8: IPv4 loopback (RFC 1122) - /ip6/::1/ipcidr/128: IPv6 loopback (kept for documentation; subset of ::/3) - /ip6/::/ipcidr/3: everything outside global unicast 2000::/3 docs/config.md: overhaul server-profile section with per-entry RFC references, override guidance for Yggdrasil/NAT64/co-located loopback, and notes on /ip6/::/ipcidr/3 blast radius. docs/changelogs/v0.42.md: highlight entry with upgrade instructions for operators who already applied server profile before v0.42. * docs(server): sort filters, correct ::/3 wording The /ip6/::/ipcidr/3 CIDR matches only the IANA-reserved 0000::/3 block. Prior wording "everything outside global unicast 2000::/3" implied wider coverage; other non-2000::/3 blocks are IANA-reserved or already covered by fc00::/7 and fe80::/10, so behavior is unchanged. Also sort new entries into their numeric positions within the IPv4 and IPv6 blocks in both profile.go and the config.md tables. * docs(v0.41): server profile filter highlight Move the server profile changelog entry from v0.42 to v0.41 since the fix ships in v0.41. Also rewrite to lead with the concrete filter list addition, link to the server profile docs section for full details and override guidance, and warn that applying the profile disables LAN and localhost peer discovery.
Provide/reprovide messages from core/node/provider.go were emitted under core:constructor (the shared core/node constructor subsystem), making GOLOG_LOG_LEVEL and `ipfs log level` hard to target for provide visibility. Scope them to "provider", matching boxo's provider package so a single lever covers both layers. - core/node/provider.go: new providerLog at the "provider" subsystem, applied to 25 keystore/reprovide/strategy/throughput call sites - test/cli/provider_test.go: reprovide dedup subtest raises provider=info instead of core:constructor=info - docs/debug-guide.md: new "Known logger subsystems" section listing provider, dht/provider, dht/provider/lan, dsqueue - docs/environment-variables.md: link to the new section from under GOLOG_LOG_LEVEL
* Upgrade to Boxo v0.39.0
* chore: bump boxo to ipfs/boxo#1140 picks up dspinner fix that snapshots the index before emitting pins, avoiding the streaming lock convoy. * docs: changelog entry for pinner stall fix * docs: clarify pinner snapshot behavior * chore: bump boxo to include ipfs/boxo#1146 Picks up the fix for "panic: pebble: closed" on shutdown (#11292): the dspinner streamIndex goroutine now recovers from any datastore panic and reports it as an error on the output channel, so the daemon exits cleanly instead of crashing when the datastore closes before pin enumeration drains. * fix(provider): quiet keystore-close on shutdown When the daemon shuts down, the keystore Close fires while the startup sync goroutine may still be in flight: the OnStart ctx is not yet cancelled, so ResetCids returning keystore.ErrClosed gets logged at Error as "sync failed". Treat keystore.ErrClosed the same as a cancelled ctx and log at Debug as "interrupted by shutdown". Apply the same rule to the periodic reprovide GC loop (whose error log got a unified message in the process). * test(cli): keystore-close log + pin ls shutdown Adds TestProviderKeystoreSyncShutdownQuiet, a CLI test that: 1. Verifies no shutdown-caused keystore-sync error (err="keystore is closed" or err="context canceled") is logged at Error level. Scans stderr line-by-line so unrelated Error logs (e.g. "reset already in progress" from the startup+periodic overlap at tight Intervals) do not false-positive the assertion. 2. Runs `ipfs pin ls --stream` against the live daemon, shuts the daemon down mid-stream, and asserts the CLI returns within 15s, does not observe a daemon panic, and produces a meaningful error message if it exited non-zero. Uses Provide.DHT.Interval=10ms so the periodic reprovide loop is always inside ResetCids when StopDaemon fires, making the shutdown race deterministic enough to catch the regression on most runs (verified empirically against the pre-fix provider.go).
# Conflicts: # docs/changelogs/v0.41.md # version.go
Merge release v0.41.0
0.41.0's httpRouterAddrFunc only resolved 0.0.0.0/:: when AutoNATv2 had a confirmed reachable address. Otherwise it forwarded raw Addresses.Swarm strings to HTTP routers, so isolated or LAN-only nodes published unreachable provider records. - core/node/libp2p/routingopt.go: fallback now calls host.Addrs(), which resolves wildcard binds to concrete interface addrs and applies the libp2p AddrsFactory (NoAnnounce CIDR, Swarm.AddrFilters); matches the DHT provide path (core/node/provider.go selfAddrsFunc) - core/node/libp2p/routingopt_test.go: stubHost.Addrs is configurable; cases rewritten around resolved host addrs, with a new case pinning that NoAnnounce CIDR filtering belongs upstream in host.Addrs - test/cli/delegated_routing_v1_http_client_test.go: new end-to-end case asserts provider records sent over HTTP never contain 0.0.0.0 or :: when Addresses.Swarm uses the default wildcard bind Fixes #11213
These tests verify behavior that is independent of who serves the release JSON: TestUpdate exercises the `ipfs update` command tree, and TestUpdateWhileDaemonRuns checks that read-only subcommands still work while the daemon holds the repo lock. They hit the real GitHub Releases API only by accident, which makes them flake on rate limits, transient 5xx, or release-asset upload races. A flake panics the harness and takes every parallel test in test/cli down with it. Replace the network call with a shared `httptest.Server` helper (`newMockGitHubReleases`) and point the spawned binary at it via `TEST_KUBO_UPDATE_GITHUB_URL`, the same hook `TestUpdateInstall` already uses. The mock returns one stable release with a matching binary asset and follows the convention used by real kubo releases: `kubo_<tag>_<os>-<arch>.<ext>`, where ext is `zip` on Windows and `tar.gz` elsewhere. This must match `assetNameForPlatformTag` in `core/commands/update_github.go`, otherwise `findReleaseAsset` reports "no release found with a binary for <os>/<arch>". No network, no token, no flake. Local runtime drops from ~70s to under 1s.
Bumps [actions/github-script](https://github.com/actions/github-script) from 8 to 9. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](actions/github-script@v8...v9) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '9' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrew Gillis <11790789+gammazero@users.noreply.github.com>
* docs(config): clarify BlockKeyCacheSize and BloomFilterSize BlockKeyCacheSize was documented as "size in bytes" but the underlying boxo blockstore wires it directly to lru.New2Q[K,V](size int) which is an entry count, not a byte budget. Fix the unit and add memory sizing guidance (~200 B/entry) plus what the cache actually short-circuits (per-block flatfs Stat on the bitswap server hot path). BloomFilterSize section expanded with: what the filter answers (negative Has only), saturation behavior at runtime growth, startup AllKeysChan rebuild cost (one-time, scales with keyset not data volume), and a cross-link to BlockKeyCacheSize as the complementary positive-path cache. Drop the dead go-ipfs-blockstore link. * docs(datastores): explain flatfs next-to-last/3 for large blockstores The default next-to-last/2 shard depth (~1024 dirs) becomes a per-shard file-count problem on nodes growing past a few million blocks: bulk enumeration (GC, BloomFilterSize rebuild on startup, Provide.Strategy=all reprovider) and per-block Stat both pay readdir cost proportional to files-per-shard. next-to-last/3 (~32k dirs) keeps per-directory counts in a range modern filesystems handle well and is the recommended choice for pinning clusters, public gateways, and mirrors. Note that shard depth is fixed at ipfs init time and re-sharding requires a full export/import. * docs(config): expand BloomFilterSize sizing with bbloom specifics Replace the generic worked example with a power-of-two reference table covering 10M to 500M blocks, and document two kubo-specific behaviors that the generic bloom-filter math does not capture: - ipfs/bbloom rounds the bit count up to the next power of two, so non-power-of-two BloomFilterSize values silently allocate more memory than configured (e.g. the historical 1199120-byte example actually allocates a 2 MiB internal filter). - kubo wires bbloom with k=7 hash positions; the FPR formula is fixed at (1 - exp(-7n/m))^7. Memory cost is roughly ~1.2 B/entry at ~1% FPR and scales linearly with target FPR. Add a saturation section showing FPR degradation at 2x / 4x / 8x the design n (~11% / ~58% / >95% respectively), and a Risks subsection clarifying that a poorly sized filter is an operational waste rather than a correctness issue (no false negatives), with a quick bytes-per-block health check. Update the hur.st calculator URL from the n=1e6 default (dev-laptop scale) to n=10e6 (representative of real kubo deployments). Reference sizes verified empirically against ipfs/bbloom v0.1.0: a 16 MiB filter at n=10M gave 0.1875% observed FPR vs 0.18% predicted; the historical ~1.14 MiB worked example at n=1M gave 0.0545% vs 0.054% predicted at the rounded 2 MiB allocation. * docs(config): define FPR up front in BloomFilterSize section The BloomFilterSize section uses "FPR" throughout without defining it. Explain in the intro that the false-positive rate is the probability of a "maybe present" answer for a CID that is not actually in the blockstore, that a false positive costs at most one wasted datastore lookup (no data loss or incorrect retrieval), and that lower FPR means more inbound Has() calls answered from RAM alone. * docs: fold rounding penalty into bloom filter budget byte/entry sizing figures now report the operationally-useful number after bbloom's power-of-two rounding, so an operator following the guidance lands close to the true memory footprint instead of the raw design-point size. - config.md (BloomFilterSize): bump byte/entry to ~1.8/2.8/4.2 at ~1% / 0.1% / 0.01% FPR, state the average ~1.5x rounding penalty (worst case ~2x); drop 500M / 1 GiB row whose m/n=17.18 broke the uniform 10.74 ratio of the rest of the table; active-voice and comma fixes in saturation, risks, and startup prose - config.md (BlockKeyCacheSize): split a comma splice; active voice in 2Q replacement description - datastores.md (flatfs): align shard table columns; soften reshard wording to note kubo ships no in-place tool, not that flatfs forbids it
A zero-value Multiaddr (since go-multiaddr v0.15 is a slice type) encodes to zero bytes on the wire. AddrsFactory was passing such empty entries through to the host's signed peer record, where peers that skip the empty-input check render them as "/" and reject the address. js-libp2p autonatv2 first flagged this against a kubo/0.39.0/2896aed/docker agent. AddrsFactory is the central chokepoint for kubo's announced addresses, so filtering here scrubs every downstream consumer until the upstream go-libp2p fix lands. See libp2p/js-libp2p#3478 (comment)
* docs(server-profile): warn about local reverse proxy gotcha `Swarm.AddrFilters` is consulted on inbound `InterceptAccept` as well as outbound dials, so loopback CIDRs in the filter list cause Kubo to reject every incoming connection from a local nginx or Caddy reverse proxy that fronts a `/ws` (or other libp2p) listener on `127.0.0.1`. The condition is silent: the OS accepts the TCP, then Kubo closes the socket before the libp2p handshake. Add an explicit note to the `Swarm.AddrFilters` section, a new row in the `server` profile override table for the reverse-proxy case, and a matching CAUTION block in the v0.41 changelog. Each pointer says: remove the loopback CIDRs from `Swarm.AddrFilters` only, and keep them in `Addresses.NoAnnounce`. * feat(libp2p): log ERROR for listeners blocked by AddrFilters or NoAnnounce Surface misconfigured listeners at startup and on every libp2p `EvtLocalAddressesUpdated` event, instead of silently dropping incoming connections or staying unadvertised. `findDeadListeners` is a pure function that walks the host's resolved listen addresses (the output of `host.Network().InterfaceListenAddresses()`, matching the post-resolution view used in #11297 for `host.Addrs()`) and matches each IP component against every CIDR rule in `Swarm.AddrFilters` and `Addresses.NoAnnounce`. Working from resolved addresses means wildcard listens like `/ip4/0.0.0.0` and `/ip6/::` are already expanded to concrete interface addresses, so the check does not flag a listener just because the unspecified address itself happens to fall inside a filter CIDR (for example `::` is in `::/3` even though the listener still accepts inbound from globally-routable peers). `MonitorDeadListeners` wires the check into fx: it runs once at startup, subscribes to `event.EvtLocalAddressesUpdated`, and re-runs the check whenever the host's address set changes (NAT mapping comes online, new interface, AutoTLS cert ready). Findings are deduplicated against the previous run so a stable misconfiguration is logged once until it is resolved or a new finding shows up. Loopback `Addresses.NoAnnounce` matches are skipped on the grounds that suppressing loopback advertisement is operator-intent on every `server`-profile node, not a misconfiguration. Loopback in `Swarm.AddrFilters` is the bug pattern that motivated this check; that match is always reported. Each ERROR line names the offending listener, the matching CIDR rule, and the field to remove the rule from to revive the listener: Addresses.Swarm listener "/ip4/127.0.0.1/tcp/8081/ws" matches Swarm.AddrFilters rule "/ip4/127.0.0.0/ipcidr/8", so Kubo rejects every incoming connection to it. Remove "/ip4/127.0.0.0/ipcidr/8" from Swarm.AddrFilters to allow connections to this listener.
* chore(deps): align deps with ipfs/boxo#1152 Bumps boxo to the head of ipfs/boxo#1152 and lands the kubo-only direct deps from this week's dependabot batch in one go so go.mod stays consistent. Direct: - boxo: v0.39.0 to ipfs/boxo#1152 head - libp2p-pubsub: 0.15.0 to 0.16.0 - fsnotify: 1.9.0 to 1.10.0 - go-fuse/v2: 2.9.1-pre to 2.10.1 - otelhttp: 0.67.0 to 0.68.0 - otel, otel/sdk, otel/sdk/metric, otel/trace: 1.42.0 to 1.43.0 - otel/exporters/prometheus: 0.56.0 to 0.65.0 - contrib/propagators/autoprop: 0.46.1 to 0.68.0 Pulled in via boxo: zap 1.28.0, go-unixfsnode 1.10.4. Skipped: cheggaaa/pb v1 to v2 (incompatible API; v2 drops pb.U_BYTES and pb.New64(...).SetUnits, breaking the progress bar usage in core/commands/{cat,add,get,dag/export}.go). Supersedes #11306, #11307, #11308, #11309, #11311, #11312. * fix(metrics): drop otel_scope_info, expose scope as labels The otel prometheus exporter v0.59.0 stopped emitting the standalone otel_scope_info metric. Scope identity is now carried by otel_scope_name, otel_scope_version, and otel_scope_schema_url labels on every metric, added in v0.58.0. The bump to v0.65.0 in this branch crosses that boundary, so the t0119 baseline failed. Update the sharness baseline, docs/metrics.md, and add a v0.42 changelog highlight so operators scraping otel_scope_info know to switch their dashboards to the per-metric labels. * chore(deps): bump boxo to main (incl. ipfs/boxo#1152) boxo@main now includes ipfs/boxo#1152, replacing the temporary PR-pinned revision used in 681a4b9.
denylists only block content retrieval and local IPNS resolution. they do not stop a DHT server from storing or serving provider and IPNS records for denied keys on behalf of other peers, and they do not gate /routing/v1/ responses. document this explicitly and point operators at Routing.Type=autoclient as the way to opt out of acting as a routing intermediary for blocked content. Closes #11317 Closes #11318 Closes #11319 these issues track the implementation work to push denylists into the kad-dht provider store, the IPNS validator and pubsub path, and the /routing/v1/ HTTP layer. until that lands, autoclient is the only operator-facing knob with the same effect, so the docs need to say so.
This upgrades the pebble database to v2.1.5.
* update go-log to v2.9.2
* feat(pinner): close pinner before repo on shutdown The pinner's streaming goroutines hold a reference to the backing datastore, and pebble panics on use after Close. Before this change the panic was recovered inside the pinner (see ipfs/boxo#1146) and the symptom was only a transient log trace on daemon exit, but the race remained. Register a new fx OnStop hook that calls pinner.Close before the repo (and therefore the datastore) closes. Close drains all in-flight stream goroutines, so the datastore is closed only after the pinner is fully quiesced. Bumps boxo to pick up Pinner.Close from ipfs/boxo#1150. Fixes #11292 * chore(deps): bump boxo to ipfs/boxo#1150 (70ffcfa) * chore(deps): bump boxo to ipfs/boxo#1150 (75481f4) ipfs/boxo#1150 was reworked to use context fan-out instead of a done channel. Pinner.Close now cancels every admitted op and waits for them to return, broadening the shutdown contract from "drain streams" to "drain everything". Comments and changelog reworded to match. * chore(deps): bump boxo to latest main (b2b5d8a)
* feat: bound graceful shutdown, add diag healthy
Replace unbounded app.Stop(context.Background()) with a deadline-bounded
context driven by a new Internal.ShutdownTimeout config (default 12h,
0 disables). Add an os.Exit(1) watchdog at the same deadline so an FX
OnStop hook that never returns can no longer hang the daemon.
Add ipfs diag healthy: fails when shutdown has been initiated or when
the DAG pipeline cannot resolve the well-known empty-directory CID.
Dockerfile HEALTHCHECK now uses it so orchestrators recycle half-
shutdown daemons.
- core/shutdown: new pkg; atomic startedAt + CloseWithCtx helper
- core/builder.go: app.Stop bounded by ShutdownTimeout
- cmd/ipfs/kubo/daemon.go: watchdog + MarkStarted on signal
- core/commands/diag.go: new healthy subcommand
- core/node/{bitswap,libp2p/host,libp2p/routing}.go: OnStop hooks wrapped
- config/internal.go: ShutdownTimeout + DefaultShutdownTimeout=12h
- Dockerfile: HEALTHCHECK uses "ipfs diag healthy"
- docs/{config,changelogs/v0.42}.md: documented
- test/cli: enabled + disabled path tests
* feat: bound provider stats and ADD_PROVIDER sends
bumps go-libp2p-kad-dht past v0.39.2 to b73e1e8 to pick up two
related provider bug fixes.
- ipfs provide stat now honors client cancellation and deadlines
instead of blocking indefinitely behind a slow keystore lookup
- adds Provide.DHT.SendProviderRecordTimeout capping each
ADD_PROVIDER RPC so unresponsive peers cannot pin a provide
worker and stall reprovide cycles
- internal reprovide-alert poller bounds its Stats call so a
hung keystore.Size cannot delay shutdown
* test(shutdown): use synctest for timeout test, document sleep
CloseWithCtx_timesOut now runs in a synctest bubble so the deadline
assertion is exact (no wall-clock slack), and the simulated close uses
a release channel to drain the bubble cleanly after the leak point.
The two happy-path tests stay unchanged because their close funcs
return immediately and gain nothing from a fake clock.
Comment the 2ms sleep in TestMarkStartedPreservesFirstTimestamp so
its role (forcing time.Now() to advance between the two MarkStarted
calls so a CAS to Store regression is detectable) is not lost.
Addresses #11329 (review).
* fix(pinner): bound pinner Close with shutdown deadline
The boxo Pinner.Close contract notes that an in-flight op ignoring
its ctx (a downstream bug) can block Close, so the host must bound
it at the call site. Wrapping the OnStop hook with CloseWithCtx
honors Internal.ShutdownTimeout and surfaces an actionable
"subsystem 'pinner' failed to close" log on hang instead of leaving
only the watchdog os.Exit(1) trace.
* fix(shutdown): bound remaining I/O-touching OnStop hooks
Wrap the OnStop hooks whose Close can plausibly block on disk or
network: repo (datastore flush + lock release), mfs-root (datastore
writes via DAGService), peering (waits on libp2p peer goroutines),
legacy-provider (in-flight reprovide RPCs), and the dht-provider
plus keystore pair under SweepingProvider.
In-memory closes (blockservice, peerstore, resource-manager) are
left as-is since they cannot realistically hang.
For the dht-provider/keystore pair, provider closes first so nothing
can access the keystore afterwards. If the shutdown ctx fires
mid-provider-drain, the keystore close sees an expired ctx and
returns immediately; the watchdog os.Exit(1) is the ultimate
backstop, and keystore writes are fsync'd on put so missing the
explicit close is recoverable on next boot.
* fix(shutdown): bound remaining in-memory OnStop hooks
Wrap blockservice, peerstore, and resource-manager Close hooks with
CloseWithCtx for uniformity. These are pure in-memory operations
unlikely to hang in practice, but wrapping costs nothing and makes
the shutdown audit trail uniform: every OnStop hook now honors the
deadline and surfaces a named subsystem on timeout.
* fix(shutdown): bound autoRelayFeeder OnStop on ctx
OnStop waited on the feeder goroutine via <-done without honoring
the shutdown ctx. The goroutine itself selects on ctx in every
loop case, so cancel() normally suffices, but a stuck downstream
dht.WAN.GetClosestPeers that ignored its ctx could block fx.Stop
indefinitely. Adding the ctx.Done() select case mirrors the
reprovideAlert pattern in provider.go and lets the shutdown
deadline reclaim control even with a misbehaving DHT.
* docs(changelog): merge shutdown entries into one user-facing section
Combine the pinner-on-shutdown paragraph with the bounded-shutdown
section under a single "Reliable shutdown and container health checks"
heading. Lead with the visible symptoms (half-shutdown daemons,
healthy-but-dead container reports, manual docker restart) instead of
fx OnStop jargon. Frame Internal.ShutdownTimeout as a
belt-and-suspenders ceiling, with the 12-hour default sized against
the 22-hour DHT provider record expiration.
…11321) * feat(provide): add ipfs provide once for ad-hoc announcements Adds an experimental subcommand that submits provider records for the given CIDs through the provider system right away, without waiting for the next reprovide cycle. Use -r to walk the DAG and announce every reachable block. Designed against the sweep provider (the default since v0.39): StartProviding queues to the burst-provide workers, which publish records to the DHT efficiently. Works with the legacy provider too, though it queues into the slower serial worker pool. CIDs must already exist in the local blockstore. Re-announcement on the regular schedule is governed by Provide.Strategy and Provide.DHT.Interval; this command does not change either. * refactor(routing): deprecate ipfs routing provide Marks `ipfs routing provide` as deprecated and points users at the new `ipfs provide once`. The command keeps its existing Run, Encoders, and flags so existing scripts continue to work; only the status flag and helptext change. * docs(routing): clarify when ipfs routing reprovide applies Tightens the helptext and the sweep-mode error message so the constraint is obvious: this command only triggers a cycle on the legacy provider, and points users at 'ipfs provide stat --all' for monitoring the default sweep schedule. * docs: tighten provide helptext and update routing-provide references Updates docs/config.md and docs/experimental-features.md to reference 'ipfs provide once' instead of 'ipfs routing provide'. Tightens the helptext for 'ipfs provide clear' and the 'ipfs provide stat' overview: drops headings around short paragraphs, prefers active voice, and notes that the sweep provider is the default. * docs: changelog entry for ipfs provide once * docs: use 'provide system' wording consistently * test(provide): cover --recursive and multi-CID paths for provide once Adds two subtests under runProviderSuite (run for both Legacy and Sweep): - --recursive walks the DAG and announces every chunk of a 2 MiB file added with --pin=false under Provide.Strategy=roots, so the auto- provide path stays out of the way. - multiple CIDs in a single invocation succeed and the text encoder reports 'queued 3 CID(s) for immediate provide'. * feat(provide): stream cids and per-cid output for ipfs provide once Each CID flows through the command independently, so stdin can be piped without buffering and consumers see results as they happen. - Run reads CIDs from argv and then from BodyArgs (stdin scanner) one at a time, calling StartProviding per CID. - With -r, the dag.Walk visit callback emits per visited block; the walk cancels its context on the first announce error to stop fetching. - A typed ProvideOnceEvent (one per queued CID) replaces the prior batch result. JSON output streams {"Queued":"<cid>"} per line. - Text output via PostRun: when stderr is a tty, the running count is redrawn on a single line; otherwise a final count is printed. The text encoder still works for HTTP/RPC consumers (one CID per line). - Adds tests for stdin streaming and --enc=json one-event-per-line. * feat(provide): dedupe across all roots and recursive walks Previously the cid set was scoped per root, so a CID shared by two arguments or by two recursive DAG walks was announced twice. Move the set out to the Run scope so each unique CID is announced exactly once per invocation, regardless of how many times it shows up in argv, stdin, or the DAG walks. For -r, hitting an already-seen CID also stops descent into that subtree, avoiding redundant block fetches when DAGs overlap. * style(provide): rename useTTY to isTTY in PostRun * refactor(provide): align ipfs provide once with kubo cmds-lib idioms - Use the existing argumentIterator helper from cid.go to read argv followed by stdin, replacing the inlined two-loop variant. - Document why PostRun forks on encoder type (TTY redraw needs to bypass the encoder; json/xml must keep streaming through it). - Log an ERROR for unexpected response types instead of dropping them silently, mirroring the defensive pattern in cat.go's PostRun. * docs(routing): document streaming limitations of routing provide Spell out what 'ipfs routing provide' does worse than 'ipfs provide once' so users on the deprecation path know why to switch: input buffering, no per-cid output, no dedup across recursive roots, and the sync dht lookup that defeats sweep batching. * docs(changelog): rewrite ipfs provide once entry around user impact Recasts the highlight to lead with what the user can now do, not what the code does internally. Adds a one-line example showing the streaming stdin path that the previous version did not surface, and replaces "namespace" plumbing language with the actual capabilities (running count, json-per-line, single announcement per shared block under -r). * feat(provide): use boxo BloomTracker for cross-input dedup Swaps the cid.Set used by 'ipfs provide once' for the autoscaling boxo BloomTracker, the same dedup mechanism that powers Provide.Strategy=+unique. Run executes on the daemon, not the cli, so this caps daemon memory under hostile or accidental input: a user piping 100M cids previously would have grown the daemon's set to ~7 gb of resident memory; with the bloom chain it plateaus around 700 mb at the default fp rate, and under 100 mb up to 10m unique cids. The trade-off is a small false-positive rate (~1 in 4.75m, the kubo default) that can cause an occasional cid to be silently skipped. For ad-hoc providing this is acceptable; the regular reprovide cycle will pick up anything matched by Provide.Strategy on the next pass. * docs(changelog): use ipfs refs as the provide once example * style(provide): goimports import order * docs(provide): soften dedup wording, comment re.Emit gate, cover Provide.Enabled=false - Change "exactly once per invocation" to acknowledge the bloom false-positive rate now that the dedup is probabilistic. - Add a comment to the text branch of PostRun warning future readers not to call re.Emit there, since the encoder would race with the TTY counter. - Add a runProviderSuite subtest that exercises Provide.Enabled=false through the new code path (the existing routing-provide test only covers the deprecated alias's Run). * docs(changelog): clarify provide once use case and add second example - Note that provide once is also for fine-tuned control over which CIDs get announced when, alongside the regular reprovide schedule. - Add a second example using ipfs pin ls so users see the pattern for replaying their pinset alongside the dag-walk pattern. * feat(provide): error on ipfs provide once with Provide.DHT.Interval=0 When Provide.DHT.Interval=0, kubo wires NoopProvider via OnlineProviders -> OfflineProviders, so StartProviding silently no-ops and the cid never gets announced. provide once was returning success without any DHT publish: a footgun. Add an explicit precondition check that mirrors the routing reprovide error path. Decoupling the wiring so ad-hoc provide works under Interval=0 is tracked separately. * chore(deps): pin go-libp2p-kad-dht to PR #1246 head Pulls in the WithReprovideInterval(0) burst-only mode from libp2p/go-libp2p-kad-dht#1246 so the kubo side of the Provide.DHT.Interval=0 decoupling can be developed against it. * chore(deps): re-pin go-libp2p-kad-dht to PR #1246 head Updates to the latest commit on the upstream branch (817031b) which also relaxes the dual SweepingProvider's reprovide-interval validator to accept 0, on top of the single-provider relaxation in the previous pseudo-version. * feat(provide): decouple Provide.DHT.Interval=0 from the master kill-switch Provide.Enabled is now the only switch that fully turns off the provide system. Provide.DHT.Interval=0 disables only the periodic reprovide schedule; new CIDs still announce via fast-provide-root and 'ipfs provide once'. - groups.go: drop the Interval=0 factor from isProviderEnabled. The real provider (sweep or legacy) is now wired even when Interval=0. - provider.go: skip the keystore sync goroutine in no-schedule mode. The ticker would panic on a zero interval, and with no schedule the keystore has no reader. - cmdenv/env.go: drop the fast-provide-root short-circuit on Interval=0. Provide.Enabled=false is now the only short-circuit. - commands/provide.go: drop the temporary 'cannot provide: Provide.DHT.Interval is 0' error from 'ipfs provide once'. - test/cli: replace the 'Reprovide.Interval=0 disables announcement of new CID too' test (premise is now false) with one asserting that Interval=0 + Enabled=true keeps announcing. Convert the provide-once + Interval=0 test from error path to success path. Tighten the legacy 'Manual Reprovide trigger' test to focus on the error contract. Requires upstream go-libp2p-kad-dht support for WithReprovideInterval(0) (kept under PR #1246). * feat(config): require explicit Provide.Enabled when Provide.DHT.Interval=0 Provide.DHT.Interval=0 used to disable the entire provide system as a side effect. After the decoupling it disables only the periodic reprovide schedule, while new CIDs still announce via fast-provide-root and 'ipfs provide once'. To prevent silent semantic drift on upgrade, the daemon now refuses to start when Interval is explicitly set to 0 unless Provide.Enabled is also set explicitly: - Provide.Enabled=false fully disables providing (the old behaviour). - Provide.Enabled=true keeps ad-hoc providing while skipping the periodic reprovide schedule. The error message names both options so operators can pick the one that matches their intent without reading the changelog. * docs: explain new Provide.DHT.Interval=0 semantic Updates docs/config.md and the v0.42 changelog: Interval=0 now disables only the periodic reprovide schedule, and the daemon refuses to start without an explicit Provide.Enabled in that configuration. Calls out both upgrade paths (Provide.Enabled=false to fully disable, or =true to keep ad-hoc providing). * chore(deps): re-pin go-libp2p-kad-dht to amended PR #1246 head Picks up the timeOffset/timeBetween zero-guards so SweepingProvider.Stats() no longer panics with reprovideInterval=0. Required for 'ipfs provide stat' to work in no-schedule mode. * test(provide): align test expectations with new no-schedule semantic - core/commands/commands_test.go: register /provide/once in the expected command list. - test/cli/provide_stats_test.go: 'ipfs provide stat' with Provide.DHT.Interval=0 now returns valid stats (with the schedule timing fields zeroed) instead of erroring out. Update the assertion to match. * chore(deps): re-pin go-libp2p-kad-dht to amended PR #1246 head Picks up the scheduleEnabled() consistency cleanup so timeOffset and timeBetween match the rest of the upstream gates. * chore(deps): re-pin go-libp2p-kad-dht to PR #1246 merge on master picks up the three follow-up commits guillaumemichel pushed before merging libp2p/go-libp2p-kad-dht#1246: - refactor: simplify StartProvide() - refactor: minimize change diff - fix: don't remove from keystore on StopProviding() * fix(provide): use ProvideOnce in `ipfs provide once` `ipfs provide once` was calling StartProviding, which in sweep mode persists keys to the keystore and adds them to the periodic reprovide schedule. that contradicts the command's name and help text. switch to ProvideOnce so the command publishes once and leaves the schedule untouched. for the legacy provider StartProviding already wraps ProvideOnce, so legacy behaviour is unchanged. also tighten the help text to state plainly that the schedule is not modified. * fix(provider): keep keystore inert when Provide.DHT.Interval=0 In no-schedule mode the keystore has no reader (no reprovide loop) and no writer (kad-dht's burst path skips Put/Delete). Until now we still opened on-disk leveldb/pebble files for it: wasted disk and noise on upgrade/downgrade. Switch the keystore to an in-memory map in no-schedule mode and make destroyDs a no-op. Also purge any pre-existing keystore directory once at startup so users who toggle from schedule to no-schedule reclaim disk. Replace the literal `reprovideInterval == 0` check at the second call site with the named noScheduleMode flag for consistency.
Update boxo to ipfs/boxo#1128 which removes io.Seeker from the files.File interface. Callers that need seeking now type-assert to io.Seeker. - core/commands/cat: type-assert before seeking - core/coreiface/tests: type-assert before seeking
## Problem On a repo from `go-ipfs` or Kubo older than v0.27, the one-time migration uses `http.DefaultClient` (no timeouts) against a single hardcoded `trustless-gateway.link`. If that gateway is slow or blocked, the daemon hangs indefinitely before the data store opens, with no fallback. Reported in ipfs/ipfs-desktop#3147, where a user with a v11 repo thought they had lost 4,444 added images. ## Fix - HTTP client gets dial, TLS, and response-header timeouts (15s, 15s, and boxo's `DefaultRetrievalTimeout` of 30s). - The `"HTTPS"` alias in `Migration.DownloadSources` expands to five trustless community gateways instead of one. Trust is in local per-block multihash verification, not the operator. - Outbound requests send `?format=car` (or `?format=ipns-record`) alongside `Accept`, since some gateways honor only one. - `MultiFetcher` gets a session-scoped quarantine: a failing fetcher moves to the back of the rotation; after three full failed loops it latches `ErrMultiFetcherExhausted` pointing the user at `Migration.DownloadSources`. A cancelled context exits the loop early so it never poisons the quarantine. - `RetryFetcher` is removed; rotation across distinct gateways replaces same-gateway retries. Also fixes two pre-existing bugs in the same path: `NewHttpFetcher` ignored the `userAgent` argument so every request shipped Go's default `Go-http-client/1.1`, and `resolveIPNS` leaked the response body. The `Migration` config and `"HTTPS"` alias keep working the same way for users; the alias just expands to more gateways internally. Closes #7933 Closes #3137 Closes #8911 Closes ipfs/ipfs-desktop#3147
Surfaced by ipfs/service-worker-gateway#1067, where operators behind a default-deny firewall hit unreachable nodes from browser peers because UDP/4001 (QUIC, WebTransport, WebRTC-Direct) was not opened alongside TCP/4001. - new docs/production/firewall.md: inspect ufw rules, open 4001/tcp and 4001/udp, optional Kubo application profile, custom-port and rule-removal notes - daemon health (ipfs diag healthy) split from reachability (ipfs swarm addrs autonat), with Swarm.DisableNatPortMap and Swarm.EnableHolePunching pointers for nodes that stay Private - link the walkthrough from Addresses.Swarm and the Security section in docs/config.md, and from the Production index in docs/README.md
* refactor: migrate away from cheggaaa/pb v1
* updated changelogs
* fix: add space after comment slashes for consistency
* refactor: share terminal detection in cmdenv
Replace three duplicate TTY checks (get.go, dag/export.go, dag/stat.go)
with `cmdenv.IsTerminal(*os.File)` backed by `mattn/go-isatty`.
The helper uses `IsTerminal || IsCygwinTerminal`, which also detects
MSYS2 and Git Bash on Windows. Those terminals expose stdio as a
named pipe rather than a character device, so the previous
`ModeCharDevice` check suppressed the progress bar on real terminals.
- core/commands/cmdenv/tty.go: new helper
- core/commands/{add,cat,get}.go: drop local isStderrTTY
- core/commands/dag/{export,stat}.go: drop inline stat() block
- go.mod: promote mattn/go-isatty to direct (was indirect via pb/v3)
* refactor: cmdenv.ShouldShowProgress helper
Collapse the explicit-flag-or-TTY-default logic at four call sites
(`cat`, `get`, `dag export`, `dag stat`) into a single helper.
* refactor: dedupe `ipfs add` progress template
The full bar template (counters, bar, speed, percent, ETA) was
inlined at two call sites in add.go. Move it to a file-level
const.
* fix: progress bar shows MiB/s, not MiB p/s
pb v3's speed element defaults to suffix "%s p/s", so even with
pb.Bytes set, `ipfs add`, `ipfs cat`, `ipfs get`, and
`ipfs dag export` rendered the rate as "713.04 MiB p/s" instead
of "713.04 MiB/s".
Pass explicit format args to the speed and rtime template
elements: rate now renders as "MiB/s", and the unknown-state
fallback reads "?/s" / "ETA ?" instead of bare "?". The four
templates move to package-level consts.
* docs: rewrite v0.42 progress bar entry
Describe only the user-visible changes; skip library-migration
detail and intermediate-state claims that never shipped.
* chore: drop unused pb v1 dependabot ignore
The `github.com/cheggaaa/pb` (v1) module path is no longer in
`go.mod` after the migration to `pb/v3`, so the ignore rule
never fires.
* fix(dag): unify --progress help text
Match the wording used by `add`, `cat`, and `get`:
"Stream progress data. Defaults to true when stderr is a
terminal."
* fix(add): finalize progress bar after upload
Call `bar.Finish()` and a final `bar.Write()` after the progress
loop. Without it, fast adds (under ~500ms, where pb/v3's EWMA
never accumulates a speed sample) render `?/s ... ETA ?` in the
last frame. Finishing the bar switches the speed element to its
absolute-rate branch (total/elapsed), so the final frame now
reads e.g. `792.04 MiB/s 100.00% 100ms`.
* test(cmdenv): cover ShouldShowProgress
Exercise the explicit-true, explicit-false, unset, and non-bool
paths. Unset and non-bool fall back to IsTerminal(os.Stderr),
which the test compares against directly so it works in both
TTY and CI environments.
* refactor: share full progress bar template
Move the "total known" pb/v3 template to cmdenv.ProgressBarFullTemplate
so add.go and get.go reference the same string instead of keeping
byte-identical local copies. The add init template and dag/export
streaming template stay local because each is single-use and shaped
differently.
---------
Co-authored-by: Marcin Rataj <lidel@lidel.org>
* chore: bump go-libp2p-kad-dht to v0.40.0 * docs: changelog for kad-dht v0.40.0 --------- Co-authored-by: Marcin Rataj <lidel@lidel.org>
* feat(dag): add --local-only to dag export and import - Export: only export blocks present locally; skip missing (partial CAR). --local-only with --offline. Support both binary and base58 link keys. - Import: support partial CARs; --local-only with -- pin-roots=false (error if both --pin-roots and --local-only set). - Fix cidFromBinString to accept base58 key format from link implementations. Signed-off-by: Chayan Das <01chayandas@gmail.com> * chore(deps): update go-car/v2 to latest master - remove local replace directive for go-car/v2 - upgrade to v2.16.1-0.20260306172652-7d2f4aceb070 * fix(dag): avoid CID round-trip in export and fix ci failure Signed-off-by: Chayan Das <01chayandas@gmail.com> * dag: add validation and tests for --local-only flag Signed-off-by: Chayan Das <01chayandas@gmail.com> * chore(deps): bump go-car/v2 to latest master * feat(dag): --local-only auto-sets companion flags Pass --local-only without pairing it with --offline (export) or --pin-roots=false (import); the companion is now implicit. Explicit opposites (--offline=false, --pin-roots=true) are rejected so the intent stays unambiguous. * export: imply --offline so missing blocks are not fetched over the network, which would defeat --local-only * import: imply --pin-roots=false since a partial CAR has no full DAG to pin * tests: cover the new implications and the rejected explicit-opposite combinations; drop the brittle exec.CommandContext path in favor of the existing harness * refactor(dag): use boxo/walker for --local-only export The --local-only branch now uses walker.WalkDAG with WithLocality(bs.Has) and carstorage.NewWritable, matching the MFS+unique provider in core/node/provider.go. Semantics: any input-side read error during the walk (missing block, decode failure, post-locality race) is treated as "not available locally" and the block plus its subtree are skipped. Output-side errors (writable.Put) are still surfaced. --help is updated to call out the best-effort nature. The non-local-only path is unchanged. * test(dag): tighten --local-only tests, add subtree-skip case Pin chunker and max-file-links via a shared shallowDAGArgs so block counts are deterministic regardless of Import.* defaults or active profiles. Tighten existing assertions: * TestDagExportLocalOnly: assert exact fullCount=3 and partialCount=fullCount-1 instead of partialCount<fullCount * TestDagExportLocalOnlyImpliesOffline: assert exact partial block count, not just file Size > 0 (proves --offline was applied) Add TestDagExportLocalOnlySkipsSubtree: builds a 259-block DAG with depth>1 (256 chunks under 2 intermediates), removes an intermediate, and verifies the partial CAR is missing the intermediate plus all 174 of its descendants. Existing tests only exercised leaf removal. Extract countCARBlocks and makePartialDAG helpers used across tests. * docs: changelog entry for --local-only dag export/import * refactor(dag): wrap API explicitly for --local-only Replace the req.Options["offline"] = true mutation with an explicit api.WithOptions(options.Api.Offline(true)) wrap after GetApi, matching the pattern already used in core/commands/dag/import.go. Clarify in comments that the walker reads from the raw blockstore (not via the kubo CoreAPI or DAGService) and therefore cannot trigger a network fetch by construction. The --offline implication exists for api.Block().Stat path resolution, not for the DAG walk itself. * fix(provider): quiet context.Canceled on shutdown ResetCids returns ctx.Err() straight from its ctx-done select, so a shutdown-during-sync surfaces as err="context canceled" while the outer ctx.Err() check at the classifier sometimes races behind the propagation and logs at Error. Classify context.Canceled the same way as keystore.ErrClosed so the message lands at Debug. Applied to both the startup and periodic classifiers. DeadlineExceeded is intentionally not included: nothing in the current call chain imposes a deadline, and a future timeout would be a real failure worth logging at Error. Closes the flake in TestProviderKeystoreSyncShutdownQuiet (10/10 local soak now green; CI hit the race 3 reruns in a row). --------- Signed-off-by: Chayan Das <01chayandas@gmail.com> Co-authored-by: Marcin Rataj <lidel@lidel.org>
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 6.0.0 to 6.0.1. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@57e3a13...e79a696) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-version: 6.0.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marcin Rataj <lidel@lidel.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
docs/changelogs/v0.42.md