Skip to content

prevent symlink loop recursion in directory traversal#289

Open
jmestwa-coder wants to merge 1 commit into
bazel-contrib:mainfrom
jmestwa-coder:replay-fix-symlink-loop-traversal
Open

prevent symlink loop recursion in directory traversal#289
jmestwa-coder wants to merge 1 commit into
bazel-contrib:mainfrom
jmestwa-coder:replay-fix-symlink-loop-traversal

Conversation

@jmestwa-coder
Copy link
Copy Markdown

Summary

Prevent unbounded recursion in directory traversal caused by symlink loops.

Problem

Traversal currently follows symlinked directories (via stat), which can create cycles (e.g., dirA → dirB → dirA) and lead to unbounded recursion and unstable behavior.

Fix

  • Use lstat to inspect paths without following symlinks
  • Recurse only into real directories (S_ISDIR && !S_ISLNK)
  • Avoid entering symlinked directory cycles

Behavior

  • Regular files and directories are unaffected
  • Symlink entries are still reported but not traversed
  • Traversal is now bounded and deterministic

Test

  • Added a regression test with a symlink loop (dirA ↔ dirB)
  • Verifies traversal runs and remains bounded (> 0 and <= 6)
  • Skips safely if symlinks are unsupported

Scope

  • Limited to traversal logic in file_util.cc
  • No changes to APIs, file handling, or build configuration

Comment thread fuzzing/replay/file_util.cc Outdated
Comment on lines +74 to +78
if (lstat(std::string(path).c_str(), &path_stat) < 0) {
return ErrnoStatus(absl::StrCat("could not stat ", path), errno);
}
if (S_ISDIR(path_stat.st_mode)) {
// Do not recurse into symlinked directories to avoid traversal loops.
if (S_ISDIR(path_stat.st_mode) && !S_ISLNK(path_stat.st_mode)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution.

This fixes walking symlink cycles, but it also changes the metadata passed to the callback from stat() target metadata to lstat() symlink metadata. The replayer filters for regular files which would now skip symlinked files which is likely not intended.

I'm also not sure we want to drop support for symlinked directories entirely. A more compatible fix would likely keep track of visited directories and use that to break symlink cycles without changing normal symlink handling.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback

I’ve updated the patch to keep using stat() so metadata semantics are preserved, and symlinked files/directories continue to behave as before. Instead of skipping symlinks, the fix now tracks visited directories using (st_dev, st_ino) to prevent cycles while still allowing traversal.

@jmestwa-coder jmestwa-coder force-pushed the replay-fix-symlink-loop-traversal branch 2 times, most recently from dbccd9b to 12b9372 Compare April 22, 2026 07:51
@jmestwa-coder jmestwa-coder force-pushed the replay-fix-symlink-loop-traversal branch from 12b9372 to 3b8b6af Compare April 22, 2026 08:16
@jmestwa-coder
Copy link
Copy Markdown
Author

Hi @simonresch any updates on this PR? Thanks!

if (stat(std::string(path).c_str(), &path_stat) < 0) {
return ErrnoStatus(absl::StrCat("could not stat ", path), errno);
}
callback(path, path_stat);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

callback is here called unconditionally including on directories. This likely breaks existing behavior and fails the test for the file utility functions. You can run them on your end with the command bazel test //fuzzing/replay:file_util_test or bazel test //... to run all tests.

The direction to detect cycles on a pair of path_stat.st_dev, path_stat.st_ino makes sense to me.

Comment on lines +140 to +141
EXPECT_GT(collected_paths.size(), 0);
EXPECT_LE(collected_paths.size(), 6);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests would need one or two normal files after adjusting the callback handling. Also, is there a reason to assert a range of size values instead of the exact number of expected files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants