Skip to content

feat: handle Ctrl-Z / fg transparently for wrapped commands#174

Merged
jy-tan merged 14 commits into
fencesandbox:mainfrom
voidexpr:feat/handle-ctrl-z
May 30, 2026
Merged

feat: handle Ctrl-Z / fg transparently for wrapped commands#174
jy-tan merged 14 commits into
fencesandbox:mainfrom
voidexpr:feat/handle-ctrl-z

Conversation

@voidexpr
Copy link
Copy Markdown
Contributor

This is all done by claude code but seems reasonable. I did some simple manual testing. I really appreciate your work with fence, I'm just starting to use it but I'm hoping to use it a lot more so hopefully this patch can help. The use case motivating this: fence around claude code and trying to put it in the background to type a few commands before going back to claude code.

When fence wrapped an interactive command on a real TTY, pressing Ctrl-Z left the child stopped while fence kept blocking in Wait(); the user's outer shell never printed "[1]+ Stopped" and the session felt hung.

Add a stop-aware wait4 loop (jobcontrol.go) that runs whenever fence is managing the host TTY foreground. On a child stop it hands the terminal back to fence's own pgrp, SIGSTOPs fence so the outer shell prints the familiar Stopped line, and on resume re-grants the TTY to the child and SIGCONTs the child group.

Notably, the loop deliberately does NOT pass WCONTINUED to wait4. On Darwin the kernel reports a continued child with status 0x137F — the standard 0x7F "stopped" low byte plus SIGCONT (0x13) in the high byte — and golang.org/x/sys/unix's WaitStatus.Stopped() returns true for that encoding (it only checks high != SIGSTOP). With WCONTINUED enabled, every SIGCONT this loop issued to the child came back looking like a fresh Ctrl-Z, sending fence into an infinite [1]+ Stopped loop on every fg. WUNTRACED-only avoids the encoding mismatch and is all the main Ctrl-Z / fg flow needs; the spec's "third-party SIGCONT to fence" case still works because the resume handshake runs inline after each kill(self, SIGSTOP) returns.

Also adds:

  • integration_jobcontrol_darwin_test.go: end-to-end PTY test that pins the Ctrl-Z / SIGCONT round-trip against ps state checks.
  • wait4_probe_test.go: gated diagnostic that records the Darwin wait4 encoding so a future contributor who tries to re-add WCONTINUED sees why it breaks.

When fence wrapped an interactive command on a real TTY, pressing Ctrl-Z
left the child stopped while fence kept blocking in Wait(); the user's
outer shell never printed "[1]+ Stopped" and the session felt hung.

Add a stop-aware wait4 loop (jobcontrol.go) that runs whenever fence is
managing the host TTY foreground. On a child stop it hands the terminal
back to fence's own pgrp, SIGSTOPs fence so the outer shell prints the
familiar Stopped line, and on resume re-grants the TTY to the child and
SIGCONTs the child group.

Notably, the loop deliberately does NOT pass WCONTINUED to wait4. On
Darwin the kernel reports a continued child with status 0x137F — the
standard 0x7F "stopped" low byte plus SIGCONT (0x13) in the high byte —
and golang.org/x/sys/unix's WaitStatus.Stopped() returns true for that
encoding (it only checks high \!= SIGSTOP). With WCONTINUED enabled,
every SIGCONT this loop issued to the child came back looking like a
fresh Ctrl-Z, sending fence into an infinite [1]+ Stopped loop on every
fg. WUNTRACED-only avoids the encoding mismatch and is all the main
Ctrl-Z / fg flow needs; the spec's "third-party SIGCONT to fence" case
still works because the resume handshake runs inline after each
kill(self, SIGSTOP) returns.

Also adds:
- integration_jobcontrol_darwin_test.go: end-to-end PTY test that pins
  the Ctrl-Z / SIGCONT round-trip against `ps` state checks.
- wait4_probe_test.go: gated diagnostic that records the Darwin wait4
  encoding so a future contributor who tries to re-add WCONTINUED sees
  why it breaks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@voidexpr voidexpr requested a review from jy-tan as a code owner May 23, 2026 00:41
Copy link
Copy Markdown
Collaborator

@jy-tan jy-tan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! A few comments, let me know what you think.

Since shouldManageHostTTYForeground is platform-agnostic, shall we also add an equivalent test for Linux?

Comment thread cmd/fence/main.go
Comment on lines 405 to 418
if err := unix.IoctlSetPointerInt(stdinFd, unix.TIOCSPGRP, childPgrp); err != nil {
if debug {
fencelog.Printf("[fence] Warning: failed to set child as foreground: %v\n", err)
}
}
defer func() {
_ = unix.IoctlSetPointerInt(stdinFd, unix.TIOCSPGRP, savedFgPgrp)
signal.Reset(syscall.SIGTTOU)
}()

jobControlEnabled = true
jobControlStdinFd = stdinFd
jobControlChildPgrp = childPgrp
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to set jobControlEnabled only if TIOCSPGRP to the child actually succeeds?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2623075 — moved the defer for restoring savedFgPgrp and the three jobControl* assignments inside the else branch, so a failed TIOCSPGRP falls back to execCmd.Wait() without activating job-control state that cannot function.

While testing this I found two more bugs in the same Ctrl-Z/bg flow (fixed in b5d31f7):

  1. waitWithJobControl re-grant after bg: after resuming from self-SIGSTOP, the code always called TIOCSPGRP(childPgrp). With fg that is correct; with bg the shell is still the terminal foreground owner, so the call stole the TTY from the shell, causing its read() to return EIO (interpreted as EOF → shell exit). Fix: check TIOCGPGRP after resuming and only re-grant when fence is the foreground.

  2. Deferred cleanup on exit: the defer in runCommand unconditionally called TIOCSPGRP(savedFgPgrp). In an interactive bash session savedFgPgrp is fence's own pgrp (bash assigns each pipeline its own pgrp before handing it the terminal), so the defer was stealing the TTY back from the shell right before exit, leaving an orphaned foreground pgrp and triggering EIO on the shell's next read. Fix: only call TIOCSPGRP(savedFgPgrp) when the child's pgrp is still the terminal foreground; skip it when the shell has already reclaimed the TTY.

// in Wait() while the inner pgrp was stopped, leaving the user's outer
// shell wedged with no `[1]+ Stopped …` notification. This test pins that
// regression down.
func TestMacOS_CtrlZSuspendsFence(t *testing.T) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking of a test that resumes and then observes the wrapped command completes (fence is not blocked forever waiting).

something of this shape:

func TestMacOS_CtrlZSuspendsFenceAndResumesChild(t *testing.T) {
	skipIfAlreadySandboxed(t)

	fenceBin := t.TempDir() + "/fence"
	build := exec.Command("go", "build", "-o", fenceBin, "../../cmd/fence") // #nosec G204
	build.Stdout = os.Stdout
	build.Stderr = os.Stderr
	if err := build.Run(); err != nil {
		t.Fatalf("failed to build fence: %v", err)
	}

	cmd := exec.Command(fenceBin, "sh", "-c", "printf 'READY\\n'; sleep 1; printf 'DONE\\n'") // #nosec G204

	ptmx, err := pty.Start(cmd)
	if err != nil {
		t.Fatalf("failed to start fence command with PTY: %v", err)
	}

	var output lockedBuffer
	done := make(chan struct{})
	go func() {
		_, _ = io.Copy(&output, ptmx)
		close(done)
	}()

	defer func() {
		_ = ptmx.Close()
		if cmd.Process != nil {
			// If the process is still stopped, SIGCONT first so Kill/Wait cleanup
			// cannot leave a stopped child behind.
			_ = syscall.Kill(cmd.Process.Pid, syscall.SIGCONT)
			_ = cmd.Process.Kill()
		}
		_ = cmd.Wait()
	}()

	if !waitForOutput(&output, "READY", 2*time.Second) {
		t.Fatalf("command did not become ready\noutput so far:\n%s", output.String())
	}

	if _, err := ptmx.Write([]byte{0x1A}); err != nil {
		t.Fatalf("failed to write Ctrl-Z: %v", err)
	}

	if !waitForProcessState(t, cmd.Process.Pid, "T", 2*time.Second) {
		t.Fatalf("fence did not enter stopped state after Ctrl-Z\noutput so far:\n%s", output.String())
	}

	if err := syscall.Kill(cmd.Process.Pid, syscall.SIGCONT); err != nil {
		t.Fatalf("failed to SIGCONT fence: %v", err)
	}

	if !waitForOutput(&output, "DONE", 5*time.Second) {
		t.Fatalf("child did not resume and complete\noutput so far:\n%s", output.String())
	}

	select {
	case <-done:
	case <-time.After(2 * time.Second):
		t.Fatalf("PTY output did not finish after child completed\noutput so far:\n%s", output.String())
	}

	if err := cmd.Wait(); err != nil {
		t.Fatalf("fence exited with error: %v\noutput:\n%s", err, output.String())
	}
}

with the following race-safe helpers:

type lockedBuffer struct {
	mu  sync.Mutex
	buf bytes.Buffer
}
func (b *lockedBuffer) Write(p []byte) (int, error) {
	b.mu.Lock()
	defer b.mu.Unlock()
	return b.buf.Write(p)
}
func (b *lockedBuffer) String() string {
	b.mu.Lock()
	defer b.mu.Unlock()
	return b.buf.String()
}
func waitForOutput(output *lockedBuffer, needle string, timeout time.Duration) bool {
	deadline := time.Now().Add(timeout)
	for time.Now().Before(deadline) {
		if strings.Contains(output.String(), needle) {
			return true
		}
		time.Sleep(25 * time.Millisecond)
	}
	return false
}

what do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TestMacOS_CtrlZSuspendsFenceAndResumesChild in bf130b9, following exactly the shape you suggested. The race-safe helpers (lockedBuffer, waitForOutput) are extracted into a separate integration_jobcontrol_darwin_helpers_test.go (same package and build tag) so both Darwin test functions share one copy.

Also added integration_jobcontrol_linux_test.go (//go:build linux) with TestLinux_CtrlZSuspendsFence and TestLinux_CtrlZSuspendsFenceAndResumesChild — same PTY approach, same ps -o state= polling — to cover the stop/resume paths on Linux where WUNTRACED and TIOCSPGRP are also live.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the stronger resume tests. One cleanup from my earlier suggestion: since CI runs with -race, async PTY capture should use the locked buffer anywhere the test goroutine reads it. The new full round-trip tests do this, but the older duplicate Test*_CtrlZSuspendsFence tests still use plain bytes.Buffer, can we either convert those too or remove the weaker duplicates?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in f68efcc — went with "remove the weaker duplicates": the Darwin/Linux harness is now a single //go:build darwin || linux helper (integration_jobcontrol_helpers_test.go) sharing one lockedBuffer, and the per-OS files are thin entry points, so there's no plain bytes.Buffer left and nothing's duplicated. (The locked-buffer conversion of the older tests itself landed earlier in 57dd5e7.)

voidexpr and others added 3 commits May 24, 2026 12:38
If IoctlSetPointerInt(TIOCSPGRP) fails (e.g. EPERM when stdin is not
the controlling terminal), the child is not the foreground pgrp and the
wait4 loop will never see a Stopped event. Guarding the defer and the
three jobControl* assignments inside the else branch lets the code fall
back to execCmd.Wait(), which is safe. Previously the defer for
restoring savedFgPgrp would also fire even though the fg pgrp was never
changed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extract lockedBuffer and waitForOutput into a darwin helpers file so
  both Darwin test functions share one race-safe copy.
- Add TestMacOS_CtrlZSuspendsFenceAndResumesChild to cover the full
  Ctrl-Z/fg round-trip: stop half was already tested; this pins the
  resume-and-complete path (waitWithJobControl re-grant + SIGCONT).
- Add integration_jobcontrol_linux_test.go (//go:build linux) with
  TestLinux_CtrlZSuspendsFence and TestLinux_CtrlZSuspendsFenceAndResumesChild
  to verify the same stop/resume paths on Linux, where WUNTRACED and
  TIOCSPGRP are also available.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After `bg`, the shell is the terminal foreground owner. Two places
were incorrectly calling TIOCSPGRP and handing the TTY elsewhere:

1. waitWithJobControl: after resuming from self-SIGSTOP, always
   re-granted the terminal to the child pgrp. With `fg` that is
   correct; with `bg` the shell is still the foreground and the
   TIOCSPGRP call caused the shell's stdin read to return EIO
   (interpreted as EOF → shell exit). Fix: check TIOCGPGRP after
   resuming and only re-grant when fence is the foreground (fg path).

2. The deferred cleanup in runCommand: unconditionally called
   TIOCSPGRP(savedFgPgrp) on exit. In an interactive bash session
   savedFgPgrp is fence's own pgrp (bash gave fence its own pgrp
   before handing it the terminal), so the defer was stealing the TTY
   back from the shell right before exit, leaving an empty foreground
   pgrp and causing the shell's next read to return EIO. Fix: only
   restore savedFgPgrp when the child's pgrp is still the terminal
   foreground (i.e. we own it); skip the call when the shell has
   already reclaimed the TTY.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@voidexpr
Copy link
Copy Markdown
Contributor Author

voidexpr commented May 26, 2026

So with claude code's help I tried to address the comments and found some extra issues. I've tested on both MacOS and Linux. Looking more I'm finding an issue I want to investigate more, I'll let you know when this is ready.

When fence calls execCmd.Start() the child is in its own pgrp but not
yet the terminal foreground — TIOCSPGRP is called a few syscalls later.
Interactive apps (e.g. Claude Code / Node.js) may read from stdin during
that window, receive SIGTTIN (background read from controlling terminal),
and stop. waitWithJobControl was treating every ws.Stopped() event as a
user Ctrl-Z: it handed the TTY back to fence's pgrp and stopped fence
itself, leaving both processes stopped and the session appearing hung.

Check ws.StopSignal() before the Ctrl-Z path: if the signal is SIGTTIN,
the child stopped due to the start-up race rather than user input. By
the time wait4 returns, TIOCSPGRP has already made the child the
foreground pgrp, so sending SIGCONT lets it retry the read successfully.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@voidexpr
Copy link
Copy Markdown
Contributor Author

ok fixed one issue I found, it's in a reviewable state.

Copy link
Copy Markdown
Collaborator

@jy-tan jy-tan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Just a few more comments.

Comment thread cmd/fence/jobcontrol.go
Comment on lines +77 to +88
case ws.Stopped():
// SIGTTIN means the child read from the terminal while it was
// still in the background — a race between Start() and the
// initial TIOCSPGRP call in the caller. The child is now the
// foreground pgrp, so resume it; do not treat this as Ctrl-Z.
if ws.StopSignal() == syscall.SIGTTIN {
if debug {
fencelog.Printf("[fence:jobctl] child stopped on SIGTTIN (start race); resuming\n")
}
_ = syscall.Kill(-childPgrp, syscall.SIGCONT)
continue
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SIGTTIN branch assumes the child is already the foreground pgrp. That’s true for the startup race, but not after Ctrl-Z -> bg; a background child reading from stdin should remain stopped, not be auto-SIGCONTed. Can we gate this on TIOCGPGRP == childPgrp before resuming?

i.e.,

if ws.StopSignal() == syscall.SIGTTIN {
	currentFg, fgErr := unix.IoctlGetInt(stdinFd, unix.TIOCGPGRP)
	if fgErr == nil && currentFg == childPgrp {
		if debug {
			fencelog.Printf("[fence:jobctl] child stopped on SIGTTIN after foreground handoff; resuming\n")
		}
		_ = syscall.Kill(-childPgrp, syscall.SIGCONT)
		continue
	}
	if debug {
		fencelog.Printf("[fence:jobctl] child stopped on SIGTTIN while not foreground (currentFg=%d); treating as job-control stop\n", currentFg)
	}
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a8cead2, implemented as you suggested: the SIGTTIN branch now reads TIOCGPGRP and only auto-SIGCONTs when currentFg == childPgrp (the foreground-handoff race). When the child stopped on SIGTTIN while not foreground (after Ctrl-Zbg), it falls through to the normal job-control stop so it stays stopped and you can fg it.

One alternative Claude proposed: in the not-foreground case, continue and leave the child stopped without fence touching the terminal at all — instead of falling through to tcsetpgrp(self) + self-SIGSTOP while the shell currently owns the terminal. That sidesteps any momentary TTY handoff while the shell is foreground (the concern behind b5d31f7), but it doesn't surface the stop to the outer shell, so fg wouldn't recover the child. I went with your fall-through since the transient handoff is the same one the normal Ctrl-Z path already relies on — but if you'd rather not touch the terminal in the bg case, the continue variant is a one-line change. Preference?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, leaning toward the continue variant because the shell already owns the terminal in this path, and a background fence calling TIOCSPGRP(self) seems like the thing we just fixed for Ctrl-Z -> bg. But I agree this needs proof either way. Can we add/manual-test Ctrl-Z -> bg, then a child stdin read, and verify the shell stays alive and fg recovers cleanly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with the third path rather than a bare continue, because continue alone deadlocks on fg recovery: the child stays stopped on SIGTTIN, fence keeps running (never self-SIGSTOPs), so there is nothing for fg to bring back and nobody to re-grant the TTY.

Comment thread cmd/fence/main.go
Comment on lines 405 to 408
if err := unix.IoctlSetPointerInt(stdinFd, unix.TIOCSPGRP, childPgrp); err != nil {
if debug {
fencelog.Printf("[fence] Warning: failed to set child as foreground: %v\n", err)
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If TIOCSPGRP(childPgrp) fails, we've already ignored SIGTTOU but never reset it. Can we reset SIGTTOU in the error branch before falling back to normal Wait()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in f6f9f40. I register signal.Reset(SIGTTOU) with defer immediately after signal.Ignore(SIGTTOU), so it's reset on every exit path — including the TIOCSPGRP-failure fallback to Wait() you flagged. Used a defer rather than only the error branch so no later early-return can leak the ignored disposition; the foreground-pgrp restore stays success-only.

// in Wait() while the inner pgrp was stopped, leaving the user's outer
// shell wedged with no `[1]+ Stopped …` notification. This test pins that
// regression down.
func TestMacOS_CtrlZSuspendsFence(t *testing.T) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the stronger resume tests. One cleanup from my earlier suggestion: since CI runs with -race, async PTY capture should use the locked buffer anywhere the test goroutine reads it. The new full round-trip tests do this, but the older duplicate Test*_CtrlZSuspendsFence tests still use plain bytes.Buffer, can we either convert those too or remove the weaker duplicates?

voidexpr and others added 8 commits May 26, 2026 05:32
Both TestMacOS_CtrlZSuspendsFence and TestLinux_CtrlZSuspendsFence were
writing to a plain bytes.Buffer from a goroutine while the test body read
it on error paths, creating a data race under -race.  Switch them to the
existing locked-buffer types so all four job-control tests are race-safe.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both the main.go foreground-restore defer and the jobcontrol.go fg re-grant
encode the same invariant — only call TIOCSPGRP if we still own the terminal
foreground — with different reference/target pgrps. Extract a single
setForegroundIfOwner helper used by both so a future tweak to the guard can't
diverge between the two sites. No behavior change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The SIGTTIN branch treated every SIGTTIN stop as the Start()/TIOCSPGRP
foreground-handoff race and unconditionally re-SIGCONTed the child. After
Ctrl-Z -> bg the shell owns the terminal and the child is in the background; if
it then reads stdin the kernel stops it with SIGTTIN, this loop resumed it, it
read again, and spun at 100% CPU. Gate the resume fast path on
TIOCGPGRP == childPgrp; when the child is not the foreground pgrp, fall through
to the normal job-control stop so it stays stopped and the outer shell can fg
it later.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
signal.Ignore(SIGTTOU) ran unconditionally, but signal.Reset lived only in the
TIOCSPGRP-success defer. A failed TIOCSPGRP fell through to the execCmd.Wait()
path with SIGTTOU still ignored process-wide, which can mask later background
terminal writes/ioctls. Register the reset immediately after Ignore so it runs
on every exit path; the foreground-pgrp restore stays success-only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The job-control path reaps the child via wait4 and never calls execCmd.Wait(),
so the os.Process is never marked done and the SignalForwarder stays live until
the LIFO-deferred cleanup() runs (after the TTY restore and linuxMonitors.Stop).
On macOS (no pidfd) a programmatic signal in that window could route to a
reaped/recycled PID. Call cleanup() (idempotent) right after waitWithJobControl
returns to close the multi-defer window.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The job-control path returns 128+signal for a signaled child, but the
non-job-control execCmd.Wait() fallback used exitErr.ExitCode(), which reports
-1 for that case, so an identical "child killed by signal" yielded different
fence exit codes depending on whether stdin was a TTY. Extract resolveExitCode
to apply the shell convention on both paths, and add a unit test covering the
normal-exit and signaled cases.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The wait4 switch handled only Stopped/Exited/Signaled. WUNTRACED should never
report anything else, but the surrounding logic is platform-sensitive, so a
status matching none of them would silently re-block with no diagnostic. Add a
default that logs the raw status under debug.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
lockedBuffer/waitForOutput/waitForProcessState and the two Ctrl-Z test bodies
were duplicated verbatim between the Darwin file and the linux-prefixed copies.
Move the shared harness into a //go:build darwin || linux helper
(integration_jobcontrol_helpers_test.go) and reduce the per-OS files to thin
OS-named entry points. `ps -o state=` behaves the same on both, so there were
no platform-specific assertions to keep.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@voidexpr
Copy link
Copy Markdown
Contributor Author

After addressing your three comments I ran a double Codex + Claude code review of this round and it turned up more than I expected. I folded the fixes into this PR, let me know if it's still manageable.

Extra changes:

  • edc154f refactor: extract setForegroundIfOwner for guarded tcsetpgrp: companion to comment 1; dedups the "only tcsetpgrp if we still own the foreground" guard across main.go + jobcontrol.go (no behavior change).
  • 6efc59c fix: stop signal forwarder once the job-control child is reaped: the job-control path reaps via wait4 and never calls execCmd.Wait(), leaving the SignalForwarder live during deferred cleanup; on macOS (no pidfd) it could signal a recycled PID.
  • 91e8b2d fix: report 128+signal for a signaled child on the fallback path: the non-TTY path returned -1 for a signaled child while the job-control path returned 128+sig; aligned both (+ unit test).
  • 6a9305b fix: log unexpected wait status in the job-control loop: debug-only default instead of silently re-blocking.

Two further findings were considered and rejected: assuming fg on a TIOCGPGRP error during resume (would risk re-introducing the b5d31f7 TTY-steal), and preserving exit status on a non-EINTR Wait4 error (can't occur for a child we own and reap once).

After Ctrl-Z -> bg, the shell reclaims the terminal. If the bg child
then reads stdin it stops on SIGTTIN. The previous fall-through called
TIOCSPGRP(selfPgrp) unconditionally — stealing the TTY from the shell,
the same class of bug fixed in b5d31f7 for the Ctrl-Z resume path.

Restructure waitWithJobControl's Stopped case into an if/else:
- SIGTTIN + child is foreground: start-race auto-resume (unchanged).
- SIGTTIN + child not foreground: self-SIGSTOP without TIOCSPGRP so
  the shell can fg fence; the common resume path below then re-grants
  the TTY to the child.
- All other stops (SIGTSTP): TIOCSPGRP(selfPgrp) + self-SIGSTOP
  (unchanged).

Add ctrlZBgStdinReadFgRecovery integration test (Darwin + Linux) that
covers the Ctrl-Z / fg / child-stdin-read round-trip.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@voidexpr
Copy link
Copy Markdown
Contributor Author

voidexpr commented May 30, 2026

Let me know how it looks now. I did some manual tests (simple ones) and added some more automated tests

@jy-tan jy-tan merged commit 5bfc2ab into fencesandbox:main May 30, 2026
5 checks passed
@jy-tan
Copy link
Copy Markdown
Collaborator

jy-tan commented May 30, 2026

Thanks @voidexpr! I'll add to a new release within a couple days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants