Implement tomb-based rm replacement with recovery, indexing, and restore#1
Implement tomb-based rm replacement with recovery, indexing, and restore#1Copilot wants to merge 11 commits into
Conversation
- tomb_pkg/slot.py: slot ID generation, atomic directory creation, meta.json R/W, entries.ndjson - tomb_pkg/index.py: SQLite index with events/entries/slots_index tables; incremental + full reindex; verify_slot - tomb_pkg/restore.py: restore payload to original paths; no-clobber default; -f overwrite - tomb_pkg/cli/rm_cmd.py: GNU rm-compatible flag parser; moves files to tomb slot - tomb_pkg/cli/unrm_cmd.py: restore most-recent deletion; --to override; TTY prompt - tomb_pkg/cli/tomb_cmd.py: list/show/restore/purge/index/reindex/verify subcommands - bin/rm, bin/unrm, bin/tomb: shell+python dual-mode launchers (chmod +x) - tests/: 37 pytest tests covering slot, rm, unrm, tomb (all passing) - .github/workflows/ci.yml: CI with Python 3.11 and pytest - pyproject.toml: project metadata Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: ahernsean <1649300+ahernsean@users.noreply.github.com> Agent-Logs-Url: https://github.com/ahernsean/tomb/sessions/09452e6f-9f02-455d-a15d-7274255c38e8
There was a problem hiding this comment.
Pull request overview
Implements a tomb-based safer rm system in Python, introducing slot-based storage (filesystem source of truth), restore workflows (unrm / tomb restore), and a rebuildable SQLite index to support listing/verification and recovery UX.
Changes:
- Added core slot, restore, and indexing modules (
tomb_pkg.slot,tomb_pkg.restore,tomb_pkg.index) plus CLIs forrm,unrm, andtomb. - Added CLI launchers in
bin/using the shell+Python dual-mode pattern. - Added pytest coverage and CI workflow to run tests on PRs.
Reviewed changes
Copilot reviewed 14 out of 32 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tomb_pkg/slot.py | Slot root discovery, ID generation, meta/entries IO, slot listing. |
| tomb_pkg/restore.py | Slot restore logic and helpers to find latest slots. |
| tomb_pkg/index.py | SQLite index schema + incremental/full indexing and slot verification. |
| tomb_pkg/cli/init.py | Shared exit code constants for CLI commands. |
| tomb_pkg/cli/rm_cmd.py | rm wrapper logic to move targets into tomb slots and emit entries.ndjson. |
| tomb_pkg/cli/unrm_cmd.py | unrm argument parsing and restore flow (latest slot selection + prompts). |
| tomb_pkg/cli/tomb_cmd.py | tomb subcommands for list/show/restore/purge/index/reindex/verify. |
| bin/rm | Dual-mode launcher for rm CLI entry point. |
| bin/unrm | Dual-mode launcher for unrm CLI entry point. |
| bin/tomb | Dual-mode launcher for tomb CLI entry point. |
| pyproject.toml | Project metadata, tomb console script entry, pytest config. |
| .github/workflows/ci.yml | GitHub Actions CI job to run pytest. |
| tests/test_rm.py | Tests for tomb-backed rm behaviors and modes. |
| tests/test_unrm.py | Tests for restoring deleted items via unrm (incl. no-clobber/force). |
| tests/test_tomb.py | Tests for tomb subcommands and index/verify behavior. |
| tests/test_slot.py | Tests for slot ID format/uniqueness, meta IO, concurrency, kind detection. |
| tests/init.py | Marks tests as a package (empty). |
| .gitignore | Ignores Python build/test artifacts (pyc, caches, build outputs). |
| tomb_pkg/init.py | Package marker / top-level description comment. |
| tomb_pkg/pycache/init.cpython-312.pyc | Compiled artifact committed (should be removed). |
| tomb_pkg/pycache/index.cpython-312.pyc | Compiled artifact committed (should be removed). |
| tomb_pkg/pycache/restore.cpython-312.pyc | Compiled artifact committed (should be removed). |
| tomb_pkg/pycache/slot.cpython-312.pyc | Compiled artifact committed (should be removed). |
| tomb_pkg/cli/pycache/init.cpython-312.pyc | Compiled artifact committed (should be removed). |
| tomb_pkg/cli/pycache/rm_cmd.cpython-312.pyc | Compiled artifact committed (should be removed). |
| tomb_pkg/cli/pycache/tomb_cmd.cpython-312.pyc | Compiled artifact committed (should be removed). |
| tomb_pkg/cli/pycache/unrm_cmd.cpython-312.pyc | Compiled artifact committed (should be removed). |
| tests/pycache/init.cpython-312.pyc | Compiled artifact committed (should be removed). |
| tests/pycache/test_slot.cpython-312-pytest-9.0.2.pyc | Compiled artifact committed (should be removed). |
| tests/pycache/test_tomb.cpython-312-pytest-9.0.2.pyc | Compiled artifact committed (should be removed). |
| tests/pycache/test_unrm.cpython-312-pytest-9.0.2.pyc | Compiled artifact committed (should be removed). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if interactive and not _prompt_confirm(f"rm: remove {kind} '{full}'? "): | ||
| action = "skipped" | ||
| result = "user-declined" | ||
|
|
There was a problem hiding this comment.
In interactive recursive mode (-ri), the per-entry prompts here only change the recorded NDJSON entry (action/result) but don’t affect what actually gets moved: _delete_target() later shutil.move()s the whole directory tree regardless, so “skipped” items are still removed. This breaks interactive semantics; either implement selective moves based on the prompts, or change behavior to prompt once for the top-level directory and skip the entire deletion if declined.
| env = os.environ.get("TOMB_DIR") | ||
| if env: | ||
| return Path(env) | ||
| return Path.home() / ".tomb" |
There was a problem hiding this comment.
TOMB_DIR is returned as a raw Path(env), which won’t expand ~ (common in env vars) and can lead to unexpected tomb locations. Consider using Path(os.path.expanduser(env)) (and possibly resolving it) before returning.
| def restore_slot( | ||
| slot_dir: Path, | ||
| force: bool = False, | ||
| verbose: bool = False, | ||
| dest_override: Optional[str] = None, | ||
| ) -> List[str]: | ||
| """Restore all items from a slot. Returns list of restored destination paths.""" | ||
| meta = read_meta(slot_dir) | ||
| payload_dir = slot_dir / "payload" | ||
| restored: List[str] = [] | ||
|
|
||
| for item in meta.get("items", []): | ||
| dest = _restore_item( | ||
| item, | ||
| payload_dir, | ||
| force=force, | ||
| dest_override=dest_override, | ||
| verbose=verbose, | ||
| ) |
There was a problem hiding this comment.
dest_override is applied to every item in the slot. If a slot contains multiple items, --to <path> will attempt to restore multiple payloads to the same destination path, causing partial restores (DestinationExistsError) or data loss when force=True. Consider rejecting dest_override when the slot has more than one item, or interpreting it as a destination directory and restoring each item underneath it.
| if not args: | ||
| print("tomb restore: requires a SLOT argument", file=sys.stderr) | ||
| return 2 | ||
|
|
||
| slot_id = args[0] | ||
| slot_dir = tomb_root / slot_id | ||
| if not slot_dir.is_dir(): | ||
| print(f"tomb restore: slot not found: {slot_id}", file=sys.stderr) | ||
| return 1 | ||
|
|
||
| force = "-f" in args | ||
| verbose = "-v" in args | ||
|
|
There was a problem hiding this comment.
tomb restore currently assumes the first arg is the slot ID and only checks for "-f"/"-v" anywhere in args. This means common forms like tomb restore -f <slot> will be treated as slot_id="-f" and fail. Consider parsing options first (or using argparse), allowing flags before/after the slot argument, and rejecting unknown flags consistently.
| if not args: | |
| print("tomb restore: requires a SLOT argument", file=sys.stderr) | |
| return 2 | |
| slot_id = args[0] | |
| slot_dir = tomb_root / slot_id | |
| if not slot_dir.is_dir(): | |
| print(f"tomb restore: slot not found: {slot_id}", file=sys.stderr) | |
| return 1 | |
| force = "-f" in args | |
| verbose = "-v" in args | |
| force = False | |
| verbose = False | |
| slot_id: Optional[str] = None | |
| for arg in args: | |
| if arg == "-f": | |
| force = True | |
| elif arg == "-v": | |
| verbose = True | |
| elif arg.startswith("-"): | |
| print(f"tomb restore: unknown option: {arg}", file=sys.stderr) | |
| return 2 | |
| elif slot_id is None: | |
| slot_id = arg | |
| else: | |
| print(f"tomb restore: unexpected extra argument: {arg}", file=sys.stderr) | |
| return 2 | |
| if slot_id is None: | |
| print("tomb restore: requires a SLOT argument", file=sys.stderr) | |
| return 2 | |
| slot_dir = tomb_root / slot_id | |
| if not slot_dir.is_dir(): | |
| print(f"tomb restore: slot not found: {slot_id}", file=sys.stderr) | |
| return 1 |
| root = tomb_root if tomb_root is not None else _tomb_root() | ||
| now = datetime.now(timezone.utc) | ||
| deleted_at = now.strftime("%Y-%m-%dT%H:%M:%S.%f") + "Z" | ||
| cwd = os.getcwd() | ||
|
|
||
| slot_dir = create_slot(root) | ||
| slot_id = slot_dir.name |
There was a problem hiding this comment.
create_slot() is called before verifying that any target will actually be moved. As a result, cases like rm -f /missing (or all interactive declines) will still create a new slot directory and write meta with an empty items list, polluting the tomb with empty events. Consider delaying slot creation until the first successful move, or deleting the slot directory if nothing was deleted.
| sp = item.get("stored_relpath") | ||
| if sp and payload_path.is_dir(): | ||
| full = payload_path / sp | ||
| if not full.exists(): |
There was a problem hiding this comment.
verify_slot() treats a payload symlink as missing when it’s broken because Path.exists() follows symlinks. Slots can legitimately contain symlinks (including broken ones), and restore logic already handles that with os.path.islink(). Consider checking full.exists() or full.is_symlink() (or os.path.islink(full)), so symlink payloads aren’t incorrectly reported as missing.
| if not full.exists(): | |
| if not (full.exists() or full.is_symlink()): |
| conn = open_db(tomb_root) | ||
| incremental_index(tomb_root) | ||
| conn.close() |
There was a problem hiding this comment.
_cmd_list opens the DB twice (and the first connection is only used to immediately call incremental_index, which opens its own DB connection). This adds unnecessary work and file handles; consider letting incremental_index() handle DB initialization, then opening a single connection just for query_events.
| conn = open_db(tomb_root) | |
| incremental_index(tomb_root) | |
| conn.close() | |
| incremental_index(tomb_root) |
Co-authored-by: ahernsean <1649300+ahernsean@users.noreply.github.com> Agent-Logs-Url: https://github.com/ahernsean/tomb/sessions/0b6fe67a-17b3-4cf8-9006-272f65ecdbd5
Add inline comments throughout the codebase
367af54 to
bc8b8f4
Compare
|
Three things:
|
|
@copilot apply changes based on the comments in this thread |
Co-authored-by: ahernsean <1649300+ahernsean@users.noreply.github.com> Agent-Logs-Url: https://github.com/ahernsean/tomb/sessions/22ace665-f326-43da-89ca-3b987c683bc8
…, fixes Co-authored-by: ahernsean <1649300+ahernsean@users.noreply.github.com> Agent-Logs-Url: https://github.com/ahernsean/tomb/sessions/22ace665-f326-43da-89ca-3b987c683bc8
Done in commit d91725a:
Hardlink strategy: The previous implementation used
All 7 review comments addressed in commit d91725a:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 19 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return EXIT_USER_INTERRUPT | ||
| if answer not in ("y", "yes"): | ||
| print("Aborted.") | ||
| return EXIT_USER_INTERRUPT |
There was a problem hiding this comment.
In tomb purge, if the user answers anything other than yes, the command returns EXIT_USER_INTERRUPT (130). That exit code generally implies Ctrl-C; for a normal "no"/abort it should return 0 (no changes) or a dedicated non-error code. Reserve 130 for actual KeyboardInterrupt/EOF only.
| return EXIT_USER_INTERRUPT | |
| return 0 |
| if flags["interactive"] and not os.path.isdir(target): | ||
| kind_str = detect_kind(target) | ||
| if not _prompt_confirm(f"rm: remove {kind_str} '{target}'? "): | ||
| return {} # user declined, not an error |
There was a problem hiding this comment.
Interactive prompting for non-directories uses not os.path.isdir(target), but os.path.isdir() follows symlinks. A symlink to a directory will skip the -i prompt and be removed immediately. Use a non-following check (e.g., os.path.islink()/os.stat(..., follow_symlinks=False)/Path.is_dir() with symlink handling) or base the decision on detect_kind() instead.
| except (EOFError, KeyboardInterrupt): | ||
| # Treat EOF (piped input exhausted) or Ctrl-C as a 'no' |
There was a problem hiding this comment.
_prompt_confirm catches KeyboardInterrupt and treats it as a simple "no". That prevents the caller from returning exit code 130 and can cause rm to continue processing later targets after Ctrl-C. Consider letting KeyboardInterrupt propagate (or returning a sentinel) so rm_main can stop immediately and return 130 as advertised.
| except (EOFError, KeyboardInterrupt): | |
| # Treat EOF (piped input exhausted) or Ctrl-C as a 'no' | |
| except EOFError: | |
| # Treat EOF (piped input exhausted) as a 'no' |
| if child.is_dir(follow_symlinks=False): | ||
| child_slot_relpath = os.path.join(slot_relpath, child.name).replace("\\", "/") | ||
| if _prompt_confirm(f"rm: descend into directory '{child_path}'? "): | ||
| child_moved = _interactive_recursive_move( | ||
| child_path, | ||
| src_top, | ||
| payload_dir, | ||
| child_slot_relpath, | ||
| slot_dir, | ||
| slot_id, | ||
| arg_index, | ||
| ordinal_counter, | ||
| verbose, | ||
| ) | ||
| if child_moved: | ||
| moved_anything = True | ||
| action = "deleted" if child_moved else "skipped" | ||
| result = "ok" if child_moved else "user-declined" | ||
| stored = child_slot_relpath if child_moved else None | ||
| else: | ||
| action, result, stored = "skipped", "user-declined", None | ||
|
|
||
| entry = { | ||
| "slot": slot_id, | ||
| "arg_index": arg_index, | ||
| "rel_path": rel_path, | ||
| "kind": "dir", | ||
| "action": action, | ||
| "result": result, | ||
| "stored_relpath": stored, | ||
| "ordinal": ordinal_counter[0], | ||
| } | ||
| ordinal_counter[0] += 1 | ||
| append_entry(slot_dir, entry) |
There was a problem hiding this comment.
For directory entries in interactive recursive mode, action is derived from child_moved (whether anything inside was moved), not whether the directory itself was actually removed. If some children are declined, the directory will remain on disk but is still recorded as action='deleted'. Consider setting the directory's action/result based on whether os.rmdir(child_path) succeeded (or whether the directory still exists) so entries.ndjson accurately reflects outcomes.
| for dest in restored: | ||
| print(f"restored: {dest}") |
There was a problem hiding this comment.
In tomb restore, output is duplicated when -v is used: restore_slot(..., verbose=True) prints restored: ... and then this command prints restored: ... again for each item. Consider making restore_slot pure (no printing) and letting the CLI print, or in this command only printing when verbose is false / passing verbose=False into restore_slot and handling all output here.
| for dest in restored: | |
| print(f"restored: {dest}") | |
| if not verbose: | |
| for dest in restored: | |
| print(f"restored: {dest}") |
| try: | ||
| os.rmdir(str(src)) | ||
| except OSError: | ||
| pass # Directory still has items — user declined some | ||
|
|
There was a problem hiding this comment.
_interactive_recursive_move attempts os.rmdir(src) even when no payload content was created for this directory. For an empty directory (or a directory that only contained empty subdirs), this will delete the directory from disk but moved_anything remains False, so the caller treats it as "nothing moved" and may omit it from meta.json—making the deletion non-recoverable and potentially creating an empty/incorrect slot. Directory deletion in interactive mode needs to preserve empty directories in the payload and treat successful directory removal as a moved/deleted outcome.
| try: | |
| os.rmdir(str(src)) | |
| except OSError: | |
| pass # Directory still has items — user declined some | |
| removed_dir = False | |
| try: | |
| os.rmdir(str(src)) | |
| removed_dir = True | |
| except OSError: | |
| # Directory still has items — user declined some | |
| removed_dir = False | |
| if removed_dir: | |
| # Ensure an empty directory exists in the payload to represent this deletion | |
| dest_dir = payload_dir / slot_relpath | |
| dest_dir.mkdir(parents=True, exist_ok=True) | |
| rel_path = os.path.relpath(src, src_top).replace("\\", "/") | |
| entry = { | |
| "slot": slot_id, | |
| "arg_index": arg_index, | |
| "rel_path": rel_path, | |
| "kind": "dir", | |
| "action": "deleted", | |
| "result": "ok", | |
| "stored_relpath": slot_relpath, | |
| "ordinal": ordinal_counter[0], | |
| } | |
| ordinal_counter[0] += 1 | |
| append_entry(slot_dir, entry) | |
| moved_anything = True |
| # User declined every item — nothing was moved; do not record in items | ||
| return {} |
There was a problem hiding this comment.
In interactive recursive mode, moved being False is assumed to mean "user declined every item". However, _interactive_recursive_move can return False even after deleting the directory itself (e.g., empty directories), so this can drop the top-level directory from items and make it unrestorable. Adjust the recursive helper to report directory deletion accurately (including empty dirs) so the top-level item is recorded and recoverable.
| # User declined every item — nothing was moved; do not record in items | |
| return {} | |
| # If the helper reports that nothing was moved but the directory no longer | |
| # exists, then the directory itself was removed (e.g., empty dir) and | |
| # must still be recorded so it remains recoverable. | |
| if os.path.exists(target): | |
| # User declined every item and the directory is still present — | |
| # nothing was moved; do not record in items. | |
| return {} |
| import json | ||
| import os |
There was a problem hiding this comment.
index.py imports json and os but doesn't use either; this triggers lint warnings and adds noise. Remove unused imports to keep the module tidy.
| import json | |
| import os |
| for dirpath, dirnames, filenames in os.walk(str(src)): | ||
| # Record subdirectory entries (directories themselves, not their contents again) | ||
| for dname in dirnames: | ||
| full = os.path.join(dirpath, dname) | ||
| rel = os.path.relpath(full, str(src.parent)) | ||
| stored = os.path.join( | ||
| stored_relpath_base, os.path.relpath(full, str(src)) | ||
| ).replace("\\", "/") | ||
| entry = { | ||
| "slot": slot_id, | ||
| "arg_index": arg_index, | ||
| "rel_path": rel.replace("\\", "/"), | ||
| "kind": "dir", | ||
| "action": "deleted", |
There was a problem hiding this comment.
In _walk_entries, entries for dirnames are hard-coded with kind: "dir", but os.walk(..., followlinks=False) can still list symlinked directories in dirnames. Those would be misclassified (should be symlink) and could confuse entries.ndjson consumers. Consider using detect_kind(full) here (and, if needed, ensuring you don't recurse into symlinked dirs).
| # Use only the short hostname (before the first dot) to keep IDs readable | ||
| host = socket.gethostname().split(".")[0] | ||
| pid = os.getpid() | ||
| # 4 random bytes (8 hex chars) make collisions within the same microsecond extremely unlikely | ||
| rand = secrets.token_hex(4) | ||
| return f"{ts}.{us}Z-{host}-p{pid}-r{rand}" |
There was a problem hiding this comment.
generate_slot_id() embeds the raw short hostname between - separators. Many hostnames contain hyphens, which breaks the documented/assumed slot-id shape (and the test regex in tests/test_slot.py expects the host segment to contain no -). Consider sanitizing the hostname component (e.g., strip/replace - and other non-alphanumerics) or using a different delimiter/escaping so slot IDs are unambiguous and tests/consumers are stable across environments.
|
@copilot Your shebang is not what I asked for. It should be this precisely: No extra lines. |
Co-authored-by: ahernsean <1649300+ahernsean@users.noreply.github.com> Agent-Logs-Url: https://github.com/ahernsean/tomb/sessions/d4315ac0-302a-4c5d-9006-c82e8174d23d
|
@copilot apply changes based on the unresolved comments in this thread |
Co-authored-by: ahernsean <1649300+ahernsean@users.noreply.github.com> Agent-Logs-Url: https://github.com/ahernsean/tomb/sessions/3c0ad27d-b359-4a2d-b391-94d4f9ff9b0f
All 11 unresolved comments addressed in commit cb1b71a:
|
Implements a Python-based safer
rmreplacement that moves files into versioned "tomb slots" instead of permanently deleting them, with full recovery support viaunrmand atombmanagement CLI.Storage model
~/.tomb/{slot-id}/containingmeta.json, optionalentries.ndjson, andpayload/20260320T153255.123456Z-host-p12345-r4f9c2a1index.db) is a rebuildable cache; slot filesystem is source of truthCommands
bin/rm— Drop-in wrapper supporting-r/-R/-f/-i/-v/--; combined flags; exit codes 0/1/2/130. Writesentries.ndjsononly for verbose or interactive-recursive modes.bin/unrm— Restores most-recent or named deletion to original path. No-clobber by default;-fto overwrite. TTY-aware: prompts when no arg given on a terminal.--to <path>supported for single-item slots only.bin/tomb— Management subcommands:list,show,restore,purge,index,reindex,verifyLauncher pattern
All
bin/scripts use the POSIX shell+Python dual-mode pattern sopython3is resolved via shellPATH(supports virtualenvs, works without/usr/bin/env):Key design points
meta.jsonpreservesargv_path,original_abs_path, andstored_relpathas distinct fieldshost_fqdnstored viasocket.getfqdn()entries.ndjsonwritten only when-vwith recursion or-ri(not for every deletion)tomb index) skips unchanged slots by mtime/hash;tomb reindexdoes a full rebuildos.mkdir(atomic); random suffix retry on collision; creation is deferred until the first actual file move (no empty slots for no-op invocations likerm -f /nonexistent)TOMB_DIRenvironment variable supports~-prefixed paths viaos.path.expanduser()File movement strategy
For regular files on the same filesystem as the tomb,
rmcreates a hard link in the payload then unlinks the original — matching native/bin/rmunlink semantics and avoiding unnecessary data copies. Directories, symlinks, and cross-filesystem files fall back toshutil.move.Interactive recursive deletion (
-ri)Per-item prompts genuinely control what is moved: files and directories the user declines remain on disk. Only approved items are moved into the tomb payload.
entries.ndjsonrecords both"deleted"and"skipped"outcomes for each entry.Original prompt
Implementation Contract: Tomb-Based rm Replacement
Build a Python-based deletion and recovery system with these properties:
In scope:
Out of scope:
Implementation language:
Launcher strategy:
Required launcher block pattern:
Example shebang block:
Rationale captured from decisions
Target level
Must match user expectations for:
Known semantic differences allowed:
5.1 rm wrapper behavior
Supported options at minimum:
Compatibility notes:
Behavior by mode:
Granularity contract:
5.2 unrm behavior
Primary command:
Default behavior:
Argument behavior:
Flags:
Ambiguity policy:
5.3 tomb command family
Required subcommands:
Optional but recommended:
6.1 Source of truth
Source of truth is tomb slot filesystem content:
SQLite is index or cache:
6.2 Slot layout
One slot directory per top-level deletion event.
Example shape:
6.3 Slot identifier
Slot ID requirements:
Recommended form:
⚡ Quickly spin up Copilot coding agent tasks from anywhere on your macOS or Windows machine with Raycast.