Skip to content

sync: redundant git apply --reject after partial git apply -3 emits spurious .rej alongside conflict markers #543

Description

@tschm

Summary

During rhiza sync (3-way merge), a diverged target ends up with both git conflict markers (unmerged index stages) and duplicate *.rej files for the same hunks. The .rej files are spurious — the hunks are already represented as conflict markers / index entries — but they make downstream resolution fragile and noisy.

Observed in rhiza-hooks syncing v0.18.8 → v0.19.3: 14 files left with <<<<<<< / >>>>>>> markers (and git ls-files -u index stages) and 33 *.rej files, all of which were verified spurious (already applied, or byte-identical to the upstream bundle source).

Root cause

rhiza/models/_git_utils.py::_apply_diff (v0.17.6), lines ~356–395:

try:
    subprocess.run([git, "apply", "-3"], input=diff, check=True, ...)
except CalledProcessError as e:
    stderr = ...
    if "lacks the necessary blob" in stderr and base_snapshot and upstream_snapshot:
        return self._merge_file_fallback(...)   # markers only — OK
    if stderr:
        logger.warning(...)
    # Fall back to --reject for conflict files
    subprocess.run([git, "apply", "--reject"], input=diff, check=True, ...)   # <-- re-applies the WHOLE diff
    return False

When git apply -3 does have the blobs, it performs a real 3-way merge: it applies what it can, writes conflict markers, updates the index with unmerged stages, and exits non-zero. Because the stderr is not the "lacks the necessary blob" case, control falls through to git apply --reject — which re-applies the entire same diff from scratch. For hunks git apply -3 already turned into conflict markers, --reject now also drops a .rej. Hence markers and .rej for the same hunks.

The _merge_file_fallback branch is fine (markers only). The bug is the --reject re-run after a partial -3 merge.

Expected

A given hunk should be represented once — either as a conflict marker (preferred, since -3 already produced a resolvable index state) or as a .rej, never both.

Suggested fix

When git apply -3 partially applied (non-zero exit, not the missing-blob case), don't re-run --reject over the full diff. Options:

  • Trust the -3 result: the markers/index ARE the conflict representation — just return False and let the caller report markers.
  • Or, if --reject is still wanted for genuinely-unapplied files, first git checkout/reset the paths -3 already touched so the two strategies don't overlap.

Secondary observation (possibly separate issue)

_clean_orphaned_files did not remove a stale managed file (.rhiza/tests/integration/test_workflow_stubs.py) that was dropped from the manifest in an earlier release — the sync logged "No orphaned files to clean up". It appears orphan detection only considers files present in the immediately previous lock; a file that fell out of the manifest in an earlier sync and was never cleaned becomes a permanent orphan. Worth confirming the comparison set.

Repro

Sync any sufficiently-diverged downstream repo from an older ref: to a newer one and observe *.rej files for hunks the index merge already turned into conflict markers.

Filed from downstream tracking issue Jebel-Quant/rhiza-hooks#200.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions