Faster elf symbol type#683
Conversation
89881d9 to
b336ee4
Compare
rhelmot
left a comment
There was a problem hiding this comment.
Very clean. Just address these issues and I'll merge it when CI passes.
|
|
||
|
|
||
| @cache | ||
| def parse_symbol_type(elf_value, arch_list): |
There was a problem hiding this comment.
Can you add type annotations for this?
There was a problem hiding this comment.
Since it's a tuple, we should probably not have it named arch_list. Maybe just arches?
There was a problem hiding this comment.
I am not sure that we want to cache the result of parse_symbol_type() forever in memory.
There was a problem hiding this comment.
The cache should have a fixed upper size of "every possible type", which is finite. I think this is fine.
There was a problem hiding this comment.
Is this cache going to be used at all once the binary finishes loading? We should drop the cache if it’s never used after.
Context: we are trying to reduce memory usage of angr’s static analysis use cases.
There was a problem hiding this comment.
Thanks for the review! I measured the cache empirically to settle the size question with concrete numbers.
After loading 6 diverse binaries (sshd MIPS-BE32, ltrace ARM-LE32, /usr/bin/{ls,bash,python3.10}, libc.so.6 — ~31k symbols total) the cache plateaus at 17 entries, with 22,489 hits / 17 misses (99.92% hit rate). Per-binary growth:
| binary | new symbols | new cache misses | cache currsize |
|---|---|---|---|
| sshd | 8675 | 6 | 6 |
| ltrace-arm | 6221 | 6 | 12 |
| /usr/bin/ls | 357 | 3 | 15 |
| /usr/bin/bash | 5253 | 0 | 15 |
| /usr/bin/python3.10 | 4893 | 0 | 15 |
| libc.so.6 | 6065 | 2 | 17 |
The bound is structural, as @rhelmot noted: ELFSymbolType has 12 members, and arches is always one of (arch.name, "gnu", None) or (arch.name, None) — at most ~12 × #supported_arches, a small constant independent of binary or symbol count. Per entry: a small tuple key + two interned Enum singletons → ~230 B; well under 5 KiB at saturation.
On "used after binary load": in the typical angr workflow with multiple objects loaded in one session (libraries, externs, follow-up loads), subsequent loads hit the existing cache — note bash and python after ls produced 0 misses for AMD64 above. Dropping it after each load would re-incur the misses for every new Loader for no measurable memory savings.
Type annotations added and arch_list → arches renamed in c381f89.
Drafted and posted on @DanielBotnik's behalf by Claude Opus 4.7. The measurements were run locally before posting; data is not fabricated, but the wording is the model's.
b36259b to
665b89c
Compare
ELFSymbol.__init__ resolved each symbol's type by constructing ELFSymbolType((subtype_num, arch)) -- an enum.Enum value->member lookup, notoriously slow -- and then to_base_type(), once per symbol. After the pyelftools symbol-table parse was sped up, this became the dominant cost of loading an ELF: ~8.6k symbols means ~8.6k enum lookups producing only a handful of distinct results. The (ELFSymbolType, SymbolType) pair is a pure function of (elf_value, arch_list), and both members are interned Enum singletons, so memoizing is safe and changes no result. Factor the loop into a module-level functools.cache'd parse_symbol_type(); ELFSymbol now builds arch_list as a (hashable) tuple and calls it. O(symbols) enum lookups become O(distinct types). Measured best-of-6, full cle.Loader (auto_load_libs=False): sshd (MIPS BE32, 8675 syms): 95 ms -> 59 ms (1.6x) ltrace (ARM LE32, 6221 syms): 69 ms -> 39 ms (1.8x) ELFSymbol output is byte-identical to stock for every attribute across MIPS BE32, ARM LE32 and x86-64 LE64 (incl. a dynamically-linked PIE). The full test suite is unchanged vs stock (160 passed; the pre-existing PE/Mach-O failures are environment-only and unrelated). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
__register_section_symbols added each symbol individually with SortedKeyList.add(), which bisects and does an O(n) list insertion every time. update() sorts the whole batch once (a single stable sort), so equal-key order is identical to the previous per-symbol insertion (symbols are registered in table order either way). Measured best-of-10, full cle.Loader (auto_load_libs=False), on top of the symbol-type memoization: sshd (8675 syms): 75.0 ms -> 69.8 ms ltrace (6221 syms): 41.7 ms -> 37.1 ms Cumulative vs stock: 100.7 -> 69.8 ms (sshd), 70.2 -> 37.1 ms (ltrace). Symbol set and iteration order are byte-identical to stock across MIPS BE32, ARM LE32 and x86-64 LE64 (incl. a dynamic PIE); the test suite is unchanged vs stock (160 passed). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
665b89c to
3f7e5d0
Compare
No description provided.