Faster elf symbol type by DanielBotnik · Pull Request #683 · angr/cle

DanielBotnik · 2026-05-16T17:45:05Z

No description provided.

rhelmot

Very clean. Just address these issues and I'll merge it when CI passes.

rhelmot · 2026-05-18T21:15:28Z

+
+
+@cache
+def parse_symbol_type(elf_value, arch_list):


Can you add type annotations for this?

Since it's a tuple, we should probably not have it named arch_list. Maybe just arches?

I am not sure that we want to cache the result of parse_symbol_type() forever in memory.

The cache should have a fixed upper size of "every possible type", which is finite. I think this is fine.

Is this cache going to be used at all once the binary finishes loading? We should drop the cache if it’s never used after.

Context: we are trying to reduce memory usage of angr’s static analysis use cases.

Thanks for the review! I measured the cache empirically to settle the size question with concrete numbers.

After loading 6 diverse binaries (sshd MIPS-BE32, ltrace ARM-LE32, /usr/bin/{ls,bash,python3.10}, libc.so.6 — ~31k symbols total) the cache plateaus at 17 entries, with 22,489 hits / 17 misses (99.92% hit rate). Per-binary growth:

binary new symbols new cache misses cache currsize

sshd 8675 6 6

ltrace-arm 6221 6 12

/usr/bin/ls 357 3 15

/usr/bin/bash 5253 0 15

/usr/bin/python3.10 4893 0 15

libc.so.6 6065 2 17

The bound is structural, as @rhelmot noted: ELFSymbolType has 12 members, and arches is always one of (arch.name, "gnu", None) or (arch.name, None) — at most ~12 × #supported_arches, a small constant independent of binary or symbol count. Per entry: a small tuple key + two interned Enum singletons → ~230 B; well under 5 KiB at saturation.

On "used after binary load": in the typical angr workflow with multiple objects loaded in one session (libraries, externs, follow-up loads), subsequent loads hit the existing cache — note bash and python after ls produced 0 misses for AMD64 above. Dropping it after each load would re-incur the misses for every new Loader for no measurable memory savings.

Type annotations added and arch_list → arches renamed in c381f89.

Drafted and posted on @DanielBotnik's behalf by Claude Opus 4.7. The measurements were run locally before posting; data is not fabricated, but the wording is the model's.

ELFSymbol.__init__ resolved each symbol's type by constructing ELFSymbolType((subtype_num, arch)) -- an enum.Enum value->member lookup, notoriously slow -- and then to_base_type(), once per symbol. After the pyelftools symbol-table parse was sped up, this became the dominant cost of loading an ELF: ~8.6k symbols means ~8.6k enum lookups producing only a handful of distinct results. The (ELFSymbolType, SymbolType) pair is a pure function of (elf_value, arch_list), and both members are interned Enum singletons, so memoizing is safe and changes no result. Factor the loop into a module-level functools.cache'd parse_symbol_type(); ELFSymbol now builds arch_list as a (hashable) tuple and calls it. O(symbols) enum lookups become O(distinct types). Measured best-of-6, full cle.Loader (auto_load_libs=False): sshd (MIPS BE32, 8675 syms): 95 ms -> 59 ms (1.6x) ltrace (ARM LE32, 6221 syms): 69 ms -> 39 ms (1.8x) ELFSymbol output is byte-identical to stock for every attribute across MIPS BE32, ARM LE32 and x86-64 LE64 (incl. a dynamically-linked PIE). The full test suite is unchanged vs stock (160 passed; the pre-existing PE/Mach-O failures are environment-only and unrelated). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

__register_section_symbols added each symbol individually with SortedKeyList.add(), which bisects and does an O(n) list insertion every time. update() sorts the whole batch once (a single stable sort), so equal-key order is identical to the previous per-symbol insertion (symbols are registered in table order either way). Measured best-of-10, full cle.Loader (auto_load_libs=False), on top of the symbol-type memoization: sshd (8675 syms): 75.0 ms -> 69.8 ms ltrace (6221 syms): 41.7 ms -> 37.1 ms Cumulative vs stock: 100.7 -> 69.8 ms (sshd), 70.2 -> 37.1 ms (ltrace). Symbol set and iteration order are byte-identical to stock across MIPS BE32, ARM LE32 and x86-64 LE64 (incl. a dynamic PIE); the test suite is unchanged vs stock (160 passed). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

DanielBotnik force-pushed the faster-elf-symbol-type branch 2 times, most recently from 89881d9 to b336ee4 Compare May 16, 2026 17:48

rhelmot requested changes May 18, 2026

View reviewed changes

DanielBotnik force-pushed the faster-elf-symbol-type branch 3 times, most recently from b36259b to 665b89c Compare May 21, 2026 22:14

DanielBotnik and others added 2 commits May 22, 2026 16:56

DanielBotnik force-pushed the faster-elf-symbol-type branch from 665b89c to 3f7e5d0 Compare May 22, 2026 13:56

ltfish assigned rhelmot May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster elf symbol type#683

Faster elf symbol type#683
DanielBotnik wants to merge 2 commits into
angr:masterfrom
DanielBotnik:faster-elf-symbol-type

DanielBotnik commented May 16, 2026

Uh oh!

rhelmot left a comment

Uh oh!

rhelmot May 18, 2026

Uh oh!

rhelmot May 18, 2026

Uh oh!

ltfish May 20, 2026

Uh oh!

rhelmot May 20, 2026

Uh oh!

ltfish May 20, 2026

Uh oh!

DanielBotnik May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

binary	new symbols	new cache misses	cache currsize
sshd	8675	6	6
ltrace-arm	6221	6	12
/usr/bin/ls	357	3	15
/usr/bin/bash	5253	0	15
/usr/bin/python3.10	4893	0	15
libc.so.6	6065	2	17



		@cache
		def parse_symbol_type(elf_value, arch_list):

Conversation

DanielBotnik commented May 16, 2026

Uh oh!

rhelmot left a comment

Choose a reason for hiding this comment

Uh oh!

rhelmot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

rhelmot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

ltfish May 20, 2026

Choose a reason for hiding this comment

Uh oh!

rhelmot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

ltfish May 20, 2026

Choose a reason for hiding this comment

Uh oh!

DanielBotnik May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DanielBotnik May 21, 2026 •

edited

Loading