Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,25 @@ Because the command is exec'd without a shell, the volume path is never string-c

> **Don't double-schedule verification.** If your external tool already runs its own verify on a timer (e.g. a cron/systemd job), don't *also* set `interval` for a verify command — two heavy passes will step on each other. Pick one driver: let squirrel schedule it (so the result lands in `squirrel hooks` / the TUI) **or** let the tool schedule it (maximum independence — verification keeps happening even when the agent is down), not both.

### Index snapshots

The catalog should be as redundant as the data it describes. After every successful sync, squirrel takes one `VACUUM INTO` snapshot of the whole index (a self-contained, `db check`-able `.db` file) to a **local tier** and — for destination (bucket/sftp/…) syncs — rides a copy **along to the destination**, under each synced volume's `.squirrel-index/`. A restore-from-cloud then yields the data *and* the index that explains it.

This is **on by default, zero-config** — an absent `[backups]` table means it's enabled with the defaults below. Override or disable via:

```toml
[backups]
enabled = true # local snapshot-on-sync (default true)
dir = "" # local snapshot directory (default: <dir of db>/backups)
keep = 7 # local snapshots kept (rotation; 0 = keep all)
cloud = true # ride a copy along to destination buckets (default true)
cloud_keep = 7 # snapshots kept per <dest>/<volume>/.squirrel-index/ (0 = keep all)
```

`enabled = false` disables both halves; `cloud = false` keeps the local snapshot but uploads nothing. Snapshots are named `index-<ISO8601>-run-<id>.db` — lexically sortable and traceable to the run that produced them. A single snapshot is taken per `squirrel sync` invocation and fanned out to every target; a snapshot or upload failure is surfaced as a warning but never fails the sync.

> **Privacy.** The ride-along payload is the *full global* `index.db` — paths and BLAKE3 hashes for **all** volumes (never file contents). It lands in the same bucket as your data (same trust boundary). Use a private bucket and server-side encryption.

## Quickstart

Index a configured volume:
Expand Down Expand Up @@ -173,13 +192,16 @@ Each destination is a tree shaped like the local volumes:
pictures/
2024/cat.jpg
.squirrel-history/run-7/2024/cat.jpg # prior content of cat.jpg
.squirrel-index/index-20260604T120000.000Z-run-12.db # global index snapshot (ride-along)
docs/
invoice.pdf
.squirrel-history/run-9/invoice.pdf
```

`.squirrel-history/run-<run-id>/` is rclone's `--backup-dir` target for that sync run. It is filtered out of all subsequent comparisons so it does not grow rclone's listing time or get uploaded back. A directory literally called `.squirrel-history` in your source volume is also filtered (with a warning), to keep the reserved name out of the destination tree by accident.

`.squirrel-index/` holds the index snapshots ridden along after each successful sync (see [Index snapshots](#index-snapshots)). Like `.squirrel-history`, it is filtered out of all sync and restore transfers and from peer-sync, so a snapshot is never mistaken for user content.

## Notes

- Hash: BLAKE3-256 via `github.com/zeebo/blake3`. Stored as a 32-byte `BLOB` in the `blake3` column. The CLI accepts and prints hex.
Expand Down
9 changes: 8 additions & 1 deletion cmd/squirrel/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,14 @@ func buildSchedulerSyncRunner(cfg *config.Config, s *store.Store, rcl *sync.Rclo
if err != nil {
return agent.SyncRunReport{Err: err}
}
rep, runErr := sync.RunPair(ctx, s, rcl, pair, sync.Options{})
// Snapshot-on-sync fires on each node's scheduled syncs too (#75):
// this is the operating cadence the catalog churns on. Each kick is
// a single pair, so a fresh Snapshotter per kick is the right unit.
opts := sync.Options{}
if cfg.Backups.Enabled {
opts.Snapshot = sync.NewSnapshotter(s, rcl, snapshotConfig(cfg, s.Path()))
}
rep, runErr := sync.RunPair(ctx, s, rcl, pair, opts)
return agent.SyncRunReport{RunID: rep.RunID, Status: rep.Status, Err: runErr}
}
}
Expand Down
30 changes: 30 additions & 0 deletions cmd/squirrel/sync.go
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,13 @@ func runSync(cmd *cobra.Command, volumeName, destinationName string, opts sync.O
if err := writeRcloneConfigLogged(out, rcl, cfg); err != nil {
return err
}
// One snapshotter shared across every pair: the VACUUM INTO snapshot
// is taken once per invocation and fanned out (decision #1). Disabled
// for dry-run (no run rows to snapshot against) and when [backups] is
// turned off.
if !opts.DryRun && cfg.Backups.Enabled {
opts.Snapshot = sync.NewSnapshotter(s, rcl, snapshotConfig(cfg, s.Path()))
}

var anyFailed bool
for _, p := range pairs {
Expand All @@ -89,6 +96,23 @@ func runSync(cmd *cobra.Command, volumeName, destinationName string, opts sync.O
return nil
}

// snapshotConfig resolves the [backups] config into the sync package's
// SnapshotConfig, filling an empty backups dir with the default sibling
// backups/ directory next to the live DB (the same tier `db backup` and
// the pre-migration snapshots use).
func snapshotConfig(cfg *config.Config, dbPath string) sync.SnapshotConfig {
dir := cfg.Backups.Dir
if dir == "" {
dir = defaultBackupsDir(dbPath)
}
return sync.SnapshotConfig{
Dir: dir,
Keep: cfg.Backups.Keep,
Cloud: cfg.Backups.Cloud,
CloudKeep: cfg.Backups.CloudKeep,
}
}

// shallowSyncWarning is printed at the CLI layer when a sync or restore
// runs with --shallow. It spells out the safety trade so the operator
// knows a destination whose size and mtime happen to match the source
Expand Down Expand Up @@ -164,6 +188,12 @@ func printSyncReport(w io.Writer, rep sync.Report, runErr error) {
// the runs row is stuck in 'running' until manually reconciled.
fmt.Fprintf(w, " warning: failed to record terminal run state: %v\n", rep.FinishErr)
}
if rep.SnapshotErr != nil {
// Defense-in-depth only: the sync itself succeeded; the index
// snapshot or its ride-along did not. Surface it without colouring
// the run's status.
fmt.Fprintf(w, " warning: index snapshot: %v\n", rep.SnapshotErr)
}
if runErr != nil {
fmt.Fprintf(w, " %v\n", runErr)
}
Expand Down
89 changes: 89 additions & 0 deletions config/backups.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
package config

import "fmt"

// Backups is the resolved `[backups]` configuration governing the
// snapshot-on-sync feature (#75): after a successful sync, squirrel takes
// a VACUUM INTO snapshot of the index to a local tier and — for
// destination syncs — rides a copy along to the destination bucket so the
// catalog inherits the same redundancy as the data it describes.
//
// Defense-in-depth is the default: an absent `[backups]` table means
// "enabled with the defaults below", and individual keys override only
// what they name. Setting Enabled=false disables both halves; Cloud=false
// keeps the local snapshot but skips the ride-along upload.
type Backups struct {
// Enabled gates the whole feature — the local snapshot-on-sync and,
// transitively, the cloud ride-along.
Enabled bool
// Dir is the local snapshot directory. Empty means the consumer
// resolves it to "<dirname(db)>/backups" (the same sibling directory
// the pre-migration and `db backup` snapshots use); the dependency on
// the resolved DB path is why the default is applied at the call site
// rather than here.
Dir string
// Keep bounds the local snapshot directory: after writing, the oldest
// snapshots are rotated away until at most Keep remain. Zero means no
// rotation.
Keep int
// Cloud gates the destination ride-along. Ignored when Enabled is
// false (no snapshot is taken to upload).
Cloud bool
// CloudKeep bounds each destination's per-volume .squirrel-index/
// directory. Zero means no rotation.
CloudKeep int
}

// DefaultBackups returns the zero-config defaults: both halves on, seven
// snapshots kept on each tier, local directory resolved by the consumer.
func DefaultBackups() Backups {
return Backups{Enabled: true, Dir: "", Keep: 7, Cloud: true, CloudKeep: 7}
}

// rawBackups is the on-disk shape of the `[backups]` table. Every field is
// a pointer (or, for Dir, distinguished by emptiness) so resolveBackups
// can tell "key omitted" from "key set to the zero value" — without that,
// `enabled = false` would be indistinguishable from a missing key.
type rawBackups struct {
Enabled *bool `toml:"enabled"`
Dir string `toml:"dir"`
Keep *int `toml:"keep"`
Cloud *bool `toml:"cloud"`
CloudKeep *int `toml:"cloud_keep"`
}

// resolveBackups folds an optional `[backups]` table over the defaults. A
// nil raw (no table) yields DefaultBackups unchanged. Present keys
// override; Keep and CloudKeep must be non-negative.
func resolveBackups(raw *rawBackups) (Backups, error) {
b := DefaultBackups()
if raw == nil {
return b, nil
}
if raw.Enabled != nil {
b.Enabled = *raw.Enabled
}
if raw.Dir != "" {
dir, err := expandPath(raw.Dir)
if err != nil {
return Backups{}, fmt.Errorf("dir: %w", err)
}
b.Dir = dir
}
if raw.Keep != nil {
if *raw.Keep < 0 {
return Backups{}, fmt.Errorf("keep must be non-negative, got %d", *raw.Keep)
}
b.Keep = *raw.Keep
}
if raw.Cloud != nil {
b.Cloud = *raw.Cloud
}
if raw.CloudKeep != nil {
if *raw.CloudKeep < 0 {
return Backups{}, fmt.Errorf("cloud_keep must be non-negative, got %d", *raw.CloudKeep)
}
b.CloudKeep = *raw.CloudKeep
}
return b, nil
}
124 changes: 124 additions & 0 deletions config/backups_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
package config

import (
"strings"
"testing"
)

// TestBackupsDefaultWhenAbsent: an absent [backups] table resolves to the
// zero-config defaults — on by default, both halves enabled.
func TestBackupsDefaultWhenAbsent(t *testing.T) {
p := writeConfig(t, `
[volumes.pics]
path = "/tmp/pics"
`)
cfg, err := Load(p)
if err != nil {
t.Fatalf("Load: %v", err)
}
want := DefaultBackups()
if cfg.Backups != want {
t.Fatalf("Backups = %+v, want defaults %+v", cfg.Backups, want)
}
}

// TestBackupsOverrides: present keys override the defaults; omitted keys
// keep their default. dir is expanded to an absolute path.
func TestBackupsOverrides(t *testing.T) {
p := writeConfig(t, `
[backups]
keep = 3
cloud_keep = 10
dir = "/var/backups/squirrel"

[volumes.pics]
path = "/tmp/pics"
`)
cfg, err := Load(p)
if err != nil {
t.Fatalf("Load: %v", err)
}
b := cfg.Backups
if !b.Enabled || !b.Cloud {
t.Fatalf("Enabled=%v Cloud=%v, want both true (omitted → default)", b.Enabled, b.Cloud)
}
if b.Keep != 3 || b.CloudKeep != 10 {
t.Fatalf("Keep=%d CloudKeep=%d, want 3/10", b.Keep, b.CloudKeep)
}
if b.Dir != "/var/backups/squirrel" {
t.Fatalf("Dir = %q, want /var/backups/squirrel", b.Dir)
}
}

// TestBackupsDisabled: enabled=false is distinguishable from "omitted"
// thanks to the pointer field, and turns the whole feature off.
func TestBackupsDisabled(t *testing.T) {
p := writeConfig(t, `
[backups]
enabled = false

[volumes.pics]
path = "/tmp/pics"
`)
cfg, err := Load(p)
if err != nil {
t.Fatalf("Load: %v", err)
}
if cfg.Backups.Enabled {
t.Fatalf("Enabled = true, want false")
}
// Cloud keeps its default; Enabled is the master switch the consumer
// checks first.
if !cfg.Backups.Cloud {
t.Fatalf("Cloud = false, want default true (only enabled was set)")
}
}

// TestBackupsCloudDisabled: cloud=false keeps the local snapshot on but
// turns off the ride-along.
func TestBackupsCloudDisabled(t *testing.T) {
p := writeConfig(t, `
[backups]
cloud = false

[volumes.pics]
path = "/tmp/pics"
`)
cfg, err := Load(p)
if err != nil {
t.Fatalf("Load: %v", err)
}
if !cfg.Backups.Enabled {
t.Fatalf("Enabled = false, want true")
}
if cfg.Backups.Cloud {
t.Fatalf("Cloud = true, want false")
}
}

func TestBackupsRejectsNegativeKeep(t *testing.T) {
p := writeConfig(t, `
[backups]
keep = -1

[volumes.pics]
path = "/tmp/pics"
`)
_, err := Load(p)
if err == nil || !strings.Contains(err.Error(), "keep must be non-negative") {
t.Fatalf("expected negative-keep error, got %v", err)
}
}

func TestBackupsRejectsUnknownField(t *testing.T) {
p := writeConfig(t, `
[backups]
nope = true

[volumes.pics]
path = "/tmp/pics"
`)
if _, err := Load(p); err == nil {
t.Fatalf("expected unknown-field error for [backups].nope")
}
}
10 changes: 10 additions & 0 deletions config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,10 @@ type Config struct {
// Agent is non-nil when the config declares an `[agent]` block. The
// agent subcommand requires it; other subcommands ignore it.
Agent *Agent
// Backups is the resolved `[backups]` configuration. Always populated:
// an absent table resolves to DefaultBackups (snapshot-on-sync on with
// sensible defaults).
Backups Backups
}

// Volume is one indexable root.
Expand Down Expand Up @@ -190,6 +194,7 @@ type rawConfig struct {
Destinations map[string]map[string]any `toml:"destinations"`
Nodes map[string]rawNode `toml:"nodes"`
Agent *rawAgent `toml:"agent"`
Backups *rawBackups `toml:"backups"`
}

type rawVolume struct {
Expand Down Expand Up @@ -257,6 +262,11 @@ func (r *rawConfig) resolve(path string) (*Config, error) {
}
cfg.Agent = a
}
backups, err := resolveBackups(r.Backups)
if err != nil {
return nil, fmt.Errorf("backups: %w", err)
}
cfg.Backups = backups
return cfg, nil
}

Expand Down
Loading
Loading