Skip to content

Conversation

@bobrik
Copy link
Contributor

@bobrik bobrik commented Dec 26, 2025

For the ELF file itself blazesym does the right thing by reading files like:

/proc/2367/map_files/79c8000-20aec000

This works great when the ELF files are in a container in a separate mount namespace. When it comes to looking for split debug info via debuglink, it tries looking in by the path in the root mount namespace, which does not work, as the files are just not there. With this change we try checking the process mount namespace first via /proc/<xxx>/root.

Here's a practical example with ClickHouse running in a container with debug symbols installed inside:

  • Before:
ivan@cube:~/projects/blazesym/cli$ sudo ../target/debug/blazecli symbolize process --pid 2367 0x17658f30
2025-12-26T03:04:41.331043Z  WARN debug link references destination `clickhouse.debug` which was not found in any known location
0x00000017658f30: DB::Block::Block(std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName> >&&) @ 0x17658e80+0xb0
  • After:
ivan@cube:~/projects/blazesym/cli$ sudo ../target/debug/blazecli symbolize process --pid 2367 0x17658f30
0x00000017658f30: DB::Block::Block(std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName> >&&) @ 0x17658e80+0xb0 ./ci/tmp/build/./src/Core/Block.cpp:175:5
                  DB::Block::initializeIndexByName() @ ./ci/tmp/build/./src/Core/Block.cpp:182:23 [inlined]
                  std::__1::pair<std::__1::__hash_map_iterator<std::__1::__hash_iterator<std::__1::__hash_node<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned long>, void*>*> >, bool> std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned long, DB::StringHashForHeterogeneousLookup, std::__1::equal_to<void>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, unsigned long> > >::emplace[abi:ne190107]<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, unsigned long&>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, unsigned long&) @ ./ci/tmp/build/./contrib/llvm-project/libcxx/include/unordered_map:1255:21 [inlined]
                  _ZNSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEEmEENS_22__unordered_map_hasherIS7_S8_N2DB32StringHashForHeterogeneousLookupENS_8equal_toIvEELb1EEENS_21__unordered_map_equalIS7_S8_SD_SB_Lb1EEENS5_IS8_EEE16__emplace_uniqueB8ne190107IRS7_RmTnNS_9enable_ifIXsr21__can_extract_map_keyIT_S7_NS_4pairIKS7_mEEEE5valueEiE4typeELi0EEENSO_INS_15__hash_iteratorIPNS_11__hash_nodeIS8_PvEEEEbEEOSN_OT0_ @ ./ci/tmp/build/./contrib/llvm-project/libcxx/include/__hash_table:804:12 [inlined]

@codecov
Copy link

codecov bot commented Dec 26, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.85%. Comparing base (9983fcf) to head (4220b95).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1399   +/-   ##
=======================================
  Coverage   95.84%   95.85%           
=======================================
  Files          61       61           
  Lines       11076    11087   +11     
=======================================
+ Hits        10616    10627   +11     
  Misses        460      460           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

For the ELF file itself blazesym does the right thing by reading files like:

```
/proc/2367/map_files/79c8000-20aec000
```

This works great when the ELF files are in a container in a separate mount
namespace. When it comes to looking for split debug info via `debuglink`,
it tries looking in by the path in the root mount namespace, which does not
work, as the files are just not there. With this change we try checking
the process mount namespace first via `/proc/<xxx>/root`.

Signed-off-by: Ivan Babrou <github@ivan.computer>
@bobrik bobrik force-pushed the ivan/mount-ns-debug branch from 28da791 to 4220b95 Compare December 26, 2025 03:18
@bobrik
Copy link
Contributor Author

bobrik commented Dec 26, 2025

Arguably, looking up elf by name like /proc/2367/root/usr/bin/clickhouse should also try checking the mount namespace, since it's already provided in the path itself. I can push a commit for that if it makes sense.

@bobrik
Copy link
Contributor Author

bobrik commented Dec 26, 2025

Or perhaps I'm going about it all wrong and the users should just call into set_debug_dirs with /proc/<xxx>/root prefixed debug dirs?

@d-e-s-o
Copy link
Collaborator

d-e-s-o commented Dec 26, 2025

Hi @bobrik, this certainly looks like something we should fix, but I haven't looked at your proposal closely yet (will do so later today or early next week the latest). That being said, I'd think we'd want a unit/regression test for reproducing this issue. Do you think it would be possible to add one?

Copy link
Collaborator

@d-e-s-o d-e-s-o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change seems fine from my perspective, thanks! Left two comments. Please take a look and then see if we can add a test for this use as mentioned earlier.

Comment on lines +546 to +547


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment on lines +549 to +556
self.symbolizer
.elf_cache
.elf_resolver(entry_path, debug_dirs.as_deref())
} else {
let path = &entry_path.symbolic_path;
self.symbolizer
.elf_cache
.elf_resolver(path, self.symbolizer.maybe_debug_dirs(self.debug_syms))
.elf_resolver(path, debug_dirs.as_deref())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I think the logic is mildly and subtly wrong now: the cache expects the list of debug directories to always be the same, because the "resolver" being looked up will be the same for the same path. However, what can happen now is that the first process has root directory / and a subsequent one uses something else, then, because of caching, we wouldn't be checking the new root for debug link targets.

Realistically, it's probably rare and the fix won't be trivial, but let's at least add a TODO about this issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having different libc on the inside and the outside of the container is definitely not rare.

Let me see how caching works to understand this better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I think it will probably require quite a bit of refactoring to make this work properly, unless you have some magical idea. If you come up short, I can take care of it in early-to-mid January time frame.

Comment on lines +537 to +545
let debug_dirs = self
.symbolizer
.maybe_debug_dirs(self.debug_syms)
.map(|dirs| {
dirs.iter()
.filter_map(|dir| Some(entry_path.root_path.join(dir.strip_prefix("/").ok()?)))
.chain(dirs.iter().cloned())
.collect::<Vec<_>>()
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic seems fine, but can we rewrite to not allocate a new Vec in case root_path is /, please?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I see we don't deref the /proc/{pid}/root symbolic link, so this won't fly; but now I am wondering whether we perhaps should...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants