Skip to content

Faster elf symbol type#683

Open
DanielBotnik wants to merge 2 commits into
angr:masterfrom
DanielBotnik:faster-elf-symbol-type
Open

Faster elf symbol type#683
DanielBotnik wants to merge 2 commits into
angr:masterfrom
DanielBotnik:faster-elf-symbol-type

Conversation

@DanielBotnik
Copy link
Copy Markdown
Contributor

No description provided.

@DanielBotnik DanielBotnik force-pushed the faster-elf-symbol-type branch 2 times, most recently from 89881d9 to b336ee4 Compare May 16, 2026 17:48
Copy link
Copy Markdown
Member

@rhelmot rhelmot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean. Just address these issues and I'll merge it when CI passes.

Comment thread cle/backends/elf/symbol_type.py Outdated


@cache
def parse_symbol_type(elf_value, arch_list):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add type annotations for this?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's a tuple, we should probably not have it named arch_list. Maybe just arches?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that we want to cache the result of parse_symbol_type() forever in memory.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache should have a fixed upper size of "every possible type", which is finite. I think this is fine.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this cache going to be used at all once the binary finishes loading? We should drop the cache if it’s never used after.

Context: we are trying to reduce memory usage of angr’s static analysis use cases.

Copy link
Copy Markdown
Contributor Author

@DanielBotnik DanielBotnik May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I measured the cache empirically to settle the size question with concrete numbers.

After loading 6 diverse binaries (sshd MIPS-BE32, ltrace ARM-LE32, /usr/bin/{ls,bash,python3.10}, libc.so.6 — ~31k symbols total) the cache plateaus at 17 entries, with 22,489 hits / 17 misses (99.92% hit rate). Per-binary growth:

binary new symbols new cache misses cache currsize
sshd 8675 6 6
ltrace-arm 6221 6 12
/usr/bin/ls 357 3 15
/usr/bin/bash 5253 0 15
/usr/bin/python3.10 4893 0 15
libc.so.6 6065 2 17

The bound is structural, as @rhelmot noted: ELFSymbolType has 12 members, and arches is always one of (arch.name, "gnu", None) or (arch.name, None) — at most ~12 × #supported_arches, a small constant independent of binary or symbol count. Per entry: a small tuple key + two interned Enum singletons → ~230 B; well under 5 KiB at saturation.

On "used after binary load": in the typical angr workflow with multiple objects loaded in one session (libraries, externs, follow-up loads), subsequent loads hit the existing cache — note bash and python after ls produced 0 misses for AMD64 above. Dropping it after each load would re-incur the misses for every new Loader for no measurable memory savings.

Type annotations added and arch_listarches renamed in c381f89.


Drafted and posted on @DanielBotnik's behalf by Claude Opus 4.7. The measurements were run locally before posting; data is not fabricated, but the wording is the model's.

@DanielBotnik DanielBotnik force-pushed the faster-elf-symbol-type branch 3 times, most recently from b36259b to 665b89c Compare May 21, 2026 22:14
DanielBotnik and others added 2 commits May 22, 2026 16:56
ELFSymbol.__init__ resolved each symbol's type by constructing
ELFSymbolType((subtype_num, arch)) -- an enum.Enum value->member
lookup, notoriously slow -- and then to_base_type(), once per symbol.
After the pyelftools symbol-table parse was sped up, this became the
dominant cost of loading an ELF: ~8.6k symbols means ~8.6k enum
lookups producing only a handful of distinct results.

The (ELFSymbolType, SymbolType) pair is a pure function of
(elf_value, arch_list), and both members are interned Enum singletons,
so memoizing is safe and changes no result. Factor the loop into a
module-level functools.cache'd parse_symbol_type(); ELFSymbol now
builds arch_list as a (hashable) tuple and calls it. O(symbols) enum
lookups become O(distinct types).

Measured best-of-6, full cle.Loader (auto_load_libs=False):

  sshd   (MIPS BE32, 8675 syms):  95 ms -> 59 ms  (1.6x)
  ltrace (ARM  LE32, 6221 syms):  69 ms -> 39 ms  (1.8x)

ELFSymbol output is byte-identical to stock for every attribute across
MIPS BE32, ARM LE32 and x86-64 LE64 (incl. a dynamically-linked PIE).
The full test suite is unchanged vs stock (160 passed; the pre-existing
PE/Mach-O failures are environment-only and unrelated).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
__register_section_symbols added each symbol individually with
SortedKeyList.add(), which bisects and does an O(n) list insertion
every time. update() sorts the whole batch once (a single stable
sort), so equal-key order is identical to the previous per-symbol
insertion (symbols are registered in table order either way).

Measured best-of-10, full cle.Loader (auto_load_libs=False), on top
of the symbol-type memoization:

  sshd   (8675 syms):  75.0 ms -> 69.8 ms
  ltrace (6221 syms):  41.7 ms -> 37.1 ms

Cumulative vs stock: 100.7 -> 69.8 ms (sshd), 70.2 -> 37.1 ms (ltrace).
Symbol set and iteration order are byte-identical to stock across
MIPS BE32, ARM LE32 and x86-64 LE64 (incl. a dynamic PIE); the test
suite is unchanged vs stock (160 passed).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants