Skip to content

fix(jailer): replace hardcoded FD limits with dynamic closure strategies#406

Open
lilongen wants to merge 2 commits intoboxlite-ai:mainfrom
lilongen:fix/jailer-dynamic-fd-closure
Open

fix(jailer): replace hardcoded FD limits with dynamic closure strategies#406
lilongen wants to merge 2 commits intoboxlite-ai:mainfrom
lilongen:fix/jailer-dynamic-fd-closure

Conversation

@lilongen
Copy link
Copy Markdown

Summary

  • Replace hardcoded FD upper bounds (1024 on Linux, 4096 on macOS) with dynamic limits from getrlimit(RLIMIT_NOFILE), fixing FD leakage on systems with raised ulimit -n
  • Add /proc/self/fd enumeration via raw getdents64 syscall as an efficient middle strategy on Linux (zero heap allocation, async-signal-safe)
  • Restructure close_fds_from() into a 3-strategy cascade: close_range/proc/self/fd → brute-force with dynamic limit

Problem

The FD cleanup in the pre_exec hook used hardcoded upper bounds:

  • Linux: for fd in first_fd..1024
  • macOS: for fd in first_fd..4096

On production systems with ulimit -n 65536 or higher, any FDs above these limits leaked into the jailed process, potentially exposing credentials, database connections, or network sockets inherited from the parent.

Solution

Linux — 3-strategy cascade

Strategy Condition Complexity Description
1. close_range Linux 5.9+ O(1) Kernel closes all FDs in range (already existed)
2. /proc/self/fd /proc mounted O(open FDs) Enumerate via raw getdents64, close only open FDs
3. Brute-force Fallback O(ulimit) Close first_fd..getrlimit(RLIMIT_NOFILE)

macOS

Brute-force close with dynamic limit from getrlimit(RLIMIT_NOFILE) (replaces hardcoded 4096).

Key constraints

All operations are async-signal-safe (required for pre_exec context):

  • No heap allocation — stack buffer for getdents64, byte literals for paths
  • No io::Error, String, Vec, CString — raw syscalls only
  • getrlimit is POSIX async-signal-safe
  • RLIM_INFINITY handled by capping to 1,048,576 (Linux nr_open default)

Test plan

  • test_close_high_numbered_fd — creates FD 2000 (above old limits), verifies closure
  • test_get_max_fd_returns_positive — validates dynamic limit query
  • test_parse_fd_from_name (Linux) — covers valid FDs, ./.., empty, i32 overflow
  • All 4 existing tests continue to pass
  • cargo clippy -D warnings — zero warnings on both platforms
  • cargo fmt --check — clean
  • Tested on macOS ARM64 (6/6 passed) and Linux ARM64 via Lima (7/7 passed)

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 26, 2026 03:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the jailer’s pre-exec FD cleanup to avoid leaking inherited file descriptors on hosts with high ulimit -n, while maintaining the async-signal-safe constraint required by CommandExt::pre_exec.

Changes:

  • Replace hardcoded FD upper bounds with a dynamic limit from getrlimit(RLIMIT_NOFILE).
  • Add a Linux-only middle strategy that enumerates /proc/self/fd via raw getdents64 to close only actually-open FDs.
  • Add tests covering high-numbered FD closure and Linux FD-name parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +71 to +95
loop {
let nread = unsafe {
libc::syscall(
libc::SYS_getdents64,
dir_fd,
buf.as_mut_ptr(),
buf.len() as libc::c_uint,
)
};

if nread <= 0 {
break;
}

let mut offset = 0usize;
while offset < nread as usize {
// SAFETY: getdents64 returns packed linux_dirent64 structs.
// d_reclen is at byte offset 16. Use read_unaligned because the
// buffer is a byte array and the u16 field may not be 2-byte aligned.
let d_reclen =
unsafe { buf.as_ptr().add(offset + 16).cast::<u16>().read_unaligned() } as usize;

if d_reclen == 0 || offset + d_reclen > nread as usize {
break;
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close_fds_via_proc currently returns true even if getdents64 fails (e.g., syscall returns -1 for EINTR/EBADF). That would skip the brute-force fallback and can leave FDs open. Consider treating nread < 0 (and also malformed d_reclen conditions) as a failure: close dir_fd and return false so brute_force_close_fds() runs.

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +40
if result == 0 && rlim.rlim_cur > 0 {
if rlim.rlim_cur < i32::MAX as u64 {
rlim.rlim_cur as i32
} else {
// rlim_cur is RLIM_INFINITY or very large; cap to a safe maximum.
// 1048576 (2^20) matches Linux's default /proc/sys/fs/nr_open.
1_048_576
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_max_fd() uses if rlim.rlim_cur < i32::MAX as u64 which incorrectly treats an exact i32::MAX soft limit as “too large” and caps it to 1,048,576. This looks like an off-by-one comparison; use <= if the intent is “fits in i32”.

Copilot uses AI. Check for mistakes.
Comment thread src/boxlite/src/jailer/common/fd.rs Outdated
Comment on lines +42 to +43
// getrlimit failed or returned 0; safe default per POSIX OPEN_MAX
1024
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If getrlimit(RLIMIT_NOFILE) fails (or returns 0), get_max_fd() falls back to 1024, which can reintroduce the original FD-leak problem on hosts with a higher limit (since brute-force will stop at 1024). Consider a fallback that still errs on the side of closing more FDs (e.g., a higher cap like the existing 1,048,576) to preserve the security invariant even when getrlimit is blocked or fails.

Suggested change
// getrlimit failed or returned 0; safe default per POSIX OPEN_MAX
1024
// getrlimit failed or returned 0; use the same conservative cap to
// ensure we still attempt to close a wide range of FDs.
1_048_576

Copilot uses AI. Check for mistakes.
@lilongen lilongen force-pushed the fix/jailer-dynamic-fd-closure branch from 28e2ce4 to 5bf706b Compare April 2, 2026 14:37
@DorianZheng
Copy link
Copy Markdown
Member

Same getdents64 bug as #407: in fd.rs:71-118, nread < 0 is treated as success so close_fds_via_proc() returns true on failure and the getrlimit-based fallback never kicks in. Fix is to return false when getdents64 fails. Would also be good to add a test that exercises this error path.

lile and others added 2 commits April 21, 2026 14:04
The FD cleanup in pre_exec used hardcoded upper bounds (1024 on Linux,
4096 on macOS). On systems with raised ulimit -n, FDs above these limits
leaked into jailed processes, potentially exposing credentials, database
connections, or network sockets.

Replace with a 3-strategy cascade (Linux):
1. close_range(first_fd, ~0U, 0) — O(1), Linux 5.9+
2. /proc/self/fd enumeration via raw getdents64 — no heap allocation
3. Brute-force close with dynamic limit from getrlimit(RLIMIT_NOFILE)

macOS uses brute-force with dynamic getrlimit limit (replaces hardcoded 4096).

All operations remain async-signal-safe for the pre_exec context.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address PR boxlite-ai#406 review feedback from Copilot and DorianZheng:

- Return false from close_fds_via_proc when getdents64 fails (nread < 0)
  so the brute-force fallback runs instead of silently skipping FDs
- Fix off-by-one: use <= i32::MAX instead of < i32::MAX in get_max_fd()
- Raise getrlimit failure fallback from 1024 to 1_048_576 to preserve
  the security invariant on hosts where getrlimit is blocked
- Add test for brute-force fallback path

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 21, 2026 06:06
@lilongen lilongen force-pushed the fix/jailer-dynamic-fd-closure branch from 5bf706b to f90c941 Compare April 21, 2026 06:06
@lilongen
Copy link
Copy Markdown
Author

Addressed all review feedback in commit f90c941:

@copilot + @DorianZhengclose_fds_via_proc returns false on getdents64 failure

  • Split nread <= 0 into two checks: nread < 0 (error → return false, triggers brute-force fallback) and nread == 0 (normal EOF)

@copilot — Off-by-one in get_max_fd()

  • Changed < i32::MAX to <= i32::MAX so exact i32::MAX soft limit is handled correctly

@copilot — Fallback too low when getrlimit fails

  • Raised from 1024 to 1_048_576 to preserve security invariant

@DorianZheng — Added test for fallback path

  • New test_brute_force_fallback_closes_fds exercises brute_force_close_fds() directly

All tests pass, clippy clean.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +401 to +434
// Raise the soft FD limit so we can dup2 to a high FD number.
// Some systems default to ulimit -n 1024, which would prevent dup2 to FD 2000.
let new_limit = libc::rlimit {
rlim_cur: 4096,
rlim_max: 4096,
};
// SAFETY: setrlimit is async-signal-safe. Raising soft limit within hard limit is allowed.
if unsafe { libc::setrlimit(libc::RLIMIT_NOFILE, &new_limit) } != 0 {
// Can't raise limit (hard limit too low) — skip test gracefully
return 0;
}

// dup2 stdout to a high FD number that exceeds the old hardcoded limits.
// Use 2000 on all platforms — above old Linux limit (1024) and reasonable.
let high_fd: i32 = 2000;
let result = unsafe { libc::dup2(STDOUT_FD, high_fd) };
if result != high_fd {
// dup2 failed — unexpected after raising ulimit
return 1;
}

// Verify the high FD is open
if unsafe { libc::fcntl(high_fd, libc::F_GETFD) } < 0 {
return 2;
}

// Close all inherited FDs
if close_inherited_fds_raw().is_err() {
return 3;
}

// The high FD should now be closed
if unsafe { libc::fcntl(high_fd, libc::F_GETFD) } != -1 {
return 4;
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

child_close_high_numbered_fd returns 0 (passing the test) when setrlimit(RLIMIT_NOFILE, 4096) fails, which can make the test vacuously succeed on systems where the hard limit is low and the regression would otherwise go untested. Consider deriving a high_fd from the current hard limit (e.g., raise rlim_cur up to rlim_max, then pick high_fd = min(2000, rlim_cur - 1) and only skip if rlim_cur can’t exceed the old hardcoded bound). This keeps the test meaningful across more CI environments.

Suggested change
// Raise the soft FD limit so we can dup2 to a high FD number.
// Some systems default to ulimit -n 1024, which would prevent dup2 to FD 2000.
let new_limit = libc::rlimit {
rlim_cur: 4096,
rlim_max: 4096,
};
// SAFETY: setrlimit is async-signal-safe. Raising soft limit within hard limit is allowed.
if unsafe { libc::setrlimit(libc::RLIMIT_NOFILE, &new_limit) } != 0 {
// Can't raise limit (hard limit too low) — skip test gracefully
return 0;
}
// dup2 stdout to a high FD number that exceeds the old hardcoded limits.
// Use 2000 on all platforms — above old Linux limit (1024) and reasonable.
let high_fd: i32 = 2000;
let result = unsafe { libc::dup2(STDOUT_FD, high_fd) };
if result != high_fd {
// dup2 failed — unexpected after raising ulimit
return 1;
}
// Verify the high FD is open
if unsafe { libc::fcntl(high_fd, libc::F_GETFD) } < 0 {
return 2;
}
// Close all inherited FDs
if close_inherited_fds_raw().is_err() {
return 3;
}
// The high FD should now be closed
if unsafe { libc::fcntl(high_fd, libc::F_GETFD) } != -1 {
return 4;
const OLD_HARDCODED_LOWER_BOUND: i32 = 1024;
const TARGET_HIGH_FD: i32 = 2000;
let mut rlim = libc::rlimit {
rlim_cur: 0,
rlim_max: 0,
};
// SAFETY: getrlimit is async-signal-safe and rlim points to valid stack storage.
if unsafe { libc::getrlimit(libc::RLIMIT_NOFILE, &mut rlim) } != 0 {
return 1;
}
// Raise the soft limit as far as the current hard limit allows so we can
// test a high-numbered FD even on systems where 4096 is not permitted.
if rlim.rlim_cur < rlim.rlim_max {
let raised_limit = libc::rlimit {
rlim_cur: rlim.rlim_max,
rlim_max: rlim.rlim_max,
};
// SAFETY: setrlimit is async-signal-safe. Raising rlim_cur up to rlim_max is valid.
if unsafe { libc::setrlimit(libc::RLIMIT_NOFILE, &raised_limit) } == 0 {
rlim = raised_limit;
} else {
// Re-read the effective limit in case another policy or prior state changed it.
// SAFETY: getrlimit is async-signal-safe and rlim points to valid stack storage.
if unsafe { libc::getrlimit(libc::RLIMIT_NOFILE, &mut rlim) } != 0 {
return 2;
}
}
}
let effective_limit = if rlim.rlim_cur > i32::MAX as libc::rlim_t {
i32::MAX
} else {
rlim.rlim_cur as i32
};
if effective_limit <= OLD_HARDCODED_LOWER_BOUND {
// Can't exercise the regression meaningfully on this system.
return 0;
}
let high_fd = std::cmp::min(TARGET_HIGH_FD, effective_limit - 1);
// dup2 stdout to a high FD number that exceeds the old hardcoded limits.
let result = unsafe { libc::dup2(STDOUT_FD, high_fd) };
if result != high_fd {
return 3;
}
// Verify the high FD is open
if unsafe { libc::fcntl(high_fd, libc::F_GETFD) } < 0 {
return 4;
}
// Close all inherited FDs
if close_inherited_fds_raw().is_err() {
return 5;
}
// The high FD should now be closed
if unsafe { libc::fcntl(high_fd, libc::F_GETFD) } != -1 {
return 6;

Copilot uses AI. Check for mistakes.
Comment on lines +103 to +105
if d_reclen == 0 || offset + d_reclen > nread as usize {
break;
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In close_fds_via_proc, if d_reclen is 0 or would run past nread, the code breaks out of the inner loop but still leaves success as true and continues the outer loop. That can cause the function to return true (skipping the brute-force fallback) even though directory parsing was incomplete, potentially leaving some FDs unclosed. Consider treating this as a hard failure: set success = false and break out of the outer read loop when the record looks malformed/inconsistent, so callers reliably fall back to the brute-force strategy.

Copilot uses AI. Check for mistakes.
Comment on lines +85 to +86
// getdents64 failed (e.g., EINTR, EBADF); signal failure so
// the brute-force fallback runs instead.
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getdents64 can return -1 with EINTR; currently that is treated as a failure and triggers the brute-force fallback. That keeps correctness, but can cause a large (and avoidable) performance hit in pre_exec if an interrupt occurs. Consider checking errno and retrying the getdents64 call on EINTR, only falling back on other errors.

Suggested change
// getdents64 failed (e.g., EINTR, EBADF); signal failure so
// the brute-force fallback runs instead.
// SAFETY: errno is thread-local and may be read immediately after
// the failing syscall to distinguish EINTR from real failures.
let errno = unsafe { *libc::__errno_location() };
if errno == libc::EINTR {
continue;
}
// getdents64 failed for a non-retryable reason (e.g., EBADF);
// signal failure so the brute-force fallback runs instead.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants