Skip to content

Add dump symbols options#538

Open
Dragorn421 wants to merge 7 commits into
ethteck:mainfrom
Dragorn421:add_dump_symbols_options_w_rework_bss_size
Open

Add dump symbols options#538
Dragorn421 wants to merge 7 commits into
ethteck:mainfrom
Dragorn421:add_dump_symbols_options_w_rework_bss_size

Conversation

@Dragorn421
Copy link
Copy Markdown
Collaborator

@Dragorn421 Dragorn421 commented May 19, 2026

This PR adds information to the splat symbol dump, partially gated behind the new dump_symbols_references option:

  dump_symbols: true
  dump_symbols_references: true

As a reminder, the existing dump_symbols option enables dumping symbols to the following files into .splat/: spim_context.csv, spim_context_unksegment.csv, splat_symbols.csv

The changes add columns to the splat_symbols.csv file.

The following columns are added unconditionally: segment,subsegment,subsegment_type, where

  • segment is the splat name of the symbol's segment (as defined from the yaml), or None if the symbol is not tied to a segment
  • subsegment is the splat name of the subsegment the symbol is in, if any, or None
  • subsegment_type is the type of that subsegment (such as asm, c, data, ...) or None if there's no tied subsegment

The new option dump_symbols_references adds the following column: referenced_by

  • referenced_by is a |-separated list of symbol names that reference the symbol of the current csv line. For example
    • `` (empty string) for an unreferenced symbols
    • func_801C77C8 for a symbol referenced by a single function, func_801C77C8
    • leoCommand|D_801D95F0|leomain|LeoReset for a symbol referenced by three functions and a data symbol

Use case

These new columns are useful in order to visualize the relationships between symbols, which helps identify how to split asm and rodata sections and how to associate them.

As an example, take the following python script:

Details
#!/usr/bin/env python3

# SPDX-FileCopyrightText: 2026 Dragorn421
# SPDX-License-Identifier: CC0-1.0

import argparse
import csv
import dataclasses


@dataclasses.dataclass(frozen=True)
class Sym:
    vram_start: int
    name: str
    type: str
    segment: str
    subsegment: str
    subsegment_type: str
    referenced_by: tuple[str, ...]


syms = list[Sym]()

with open(".splat/splat_symbols.csv") as f:
    for row in csv.DictReader(f):
        if row["referenced_by"] == "":
            referenced_by = []
        else:
            referenced_by = row["referenced_by"].split("|")
        syms.append(
            Sym(
                int(row["vram_start"], 16),
                row["name"],
                row["type"],
                row["segment"],
                row["subsegment"],
                row["subsegment_type"],
                tuple(referenced_by),
            )
        )

sym_by_name = {_sym.name: _sym for _sym in syms}

parser = argparse.ArgumentParser()
parser.add_argument("segment")
parser.add_argument(
    "--section",
    nargs="+",
    help=(
        "only show this section besides text,"
        " eg --section rodata will only show text and rodata"
    ),
)
args = parser.parse_args()

section_by_subsegment_type = {
    "asm": "text",
    "c": "text",
    "textbin": "text",
    "hasm": "text",
    "data": "data",
    "rodata": "rodata",
    ".rodata": "rodata",
    "bss": "bss",
}

syms_by_section: dict[str, list[Sym]] = {}
for sym in syms:
    if sym.segment != args.segment:
        continue
    section = section_by_subsegment_type.get(sym.subsegment_type)
    assert section is not None, sym
    syms_by_section.setdefault(section, []).append(sym)

text_subsegments = sorted({_sym.subsegment for _sym in syms_by_section["text"]})
color_by_subsegment: dict[str, str] = {}
for subsegment in text_subsegments:
    h = (len(color_by_subsegment) * 0.7) % 1
    color_by_subsegment[subsegment] = f"{h} 1 1"

if args.section:
    for section in list(syms_by_section.keys()):
        if section != "text" and section not in args.section:
            del syms_by_section[section]

section_by_sym_name = {
    _sym.name: _section for _section, _syms in syms_by_section.items() for _sym in _syms
}

vram_start_by_section: dict[str, int] = {}
for section, section_syms in syms_by_section.items():
    vram_start_by_section[section] = min(_s.vram_start for _s in section_syms)


colw = 10
x_by_section = {
    "text": 0 * colw,
    "data": 1 * colw,
    "rodata": 2 * colw,
    "bss": 3 * colw,
}


def gprint(l: str):
    print(l)


gprint("digraph {")

for section, section_syms in syms_by_section.items():
    section_vram_start = vram_start_by_section[section]
    x = x_by_section[section]
    filtered_syms: list[Sym] = []
    for sym in sorted(section_syms, key=lambda sym: sym.vram_start):
        if sym.type in {"label", "jtbl_label"}:
            continue
        filtered_syms.append(sym)
    cur_subsegment = None
    i = 0
    dy = 0
    for sym in filtered_syms:
        if cur_subsegment != sym.subsegment:
            if cur_subsegment is not None:
                gprint("}")
            cur_subsegment = sym.subsegment
            gprint(f"subgraph cluster_{cur_subsegment}_{section} " "{")
            y = -i / len(filtered_syms) * 100 + dy - 0.2
            gprint(f'"{cur_subsegment} {section}"' " [" f' pos = "{x},{y}!"' f' color="none"' " ]")
            dy -= 0.8
        assert cur_subsegment is not None
        if 0:
            # y = vram position
            y = -(sym.vram_start - section_vram_start) / 500
        y = -i / len(filtered_syms) * 100 + dy
        i += 1
        color = None
        if section == "text":
            color = color_by_subsegment[cur_subsegment]
        elif section == "rodata":
            if sym.type == "jtbl":
                color = "magenta"
        gprint(
            f'"{sym.name}"'
            " ["
            f' pos = "{x},{y}!"'
            + (f' color="{color}"' if color is not None else "")
            + " ]"
        )
    if cur_subsegment is not None:
        gprint("}")

for section, section_syms in syms_by_section.items():
    for sym in section_syms:
        for sym_ref_by in sym.referenced_by:
            if (
                # ignore references from outside the segment
                sym_ref_by in section_by_sym_name
                # ignore same-section references
                and section_by_sym_name[sym_ref_by] != section
                # only show
                and (
                    # references from text
                    section_by_sym_name[sym_ref_by] == "text"
                    # or references from data to rodata
                    or (
                        section_by_sym_name[sym_ref_by] == "data"
                        and section_by_subsegment_type[sym.subsegment_type] == "rodata"
                    )
                )
            ):
                try:
                    color = color_by_subsegment[sym_by_name[sym_ref_by].subsegment]
                except KeyError:
                    color = "black"
                gprint(f'"{sym_ref_by}" -> "{sym.name}"' f' [ color = "{color}" ]')

gprint("}")

This script reads the contents of .splat/splat_symbols.csv, and given a segment name will output a graph in dot language of the symbols of that segment and their relationships. The symbols will also be visually clustered by subsegment. Optionally, the script can take a --section section1 [section2 ...] argument to restrict the visualization to just text and the given sections (text, data, rodata, bss)

As an example, here is the result of

./graph_cross_sections_refs.py n64dd --section rodata > n64dd.dot && neato -Tsvg -O n64dd.dot

(using graphviz' neato to render from dot language to svg)

n64dd.dot.svg:

Details n64dd dot

We can see two columns: the left column is the text section, and the right column is the rodata section. (since the script received --section rodata, only text and rodata are shown)

text symbols and outgoing references (represented by the arrows) are colored differently per subsegment.

Subsegments are also clustered and highlighted by black rectangles, and the name of each subsegment can be seen at their top

@AngheloAlf
Copy link
Copy Markdown
Collaborator

I think we should get rid of dump_symbols_segments and just make it part of the default behavior.

I can see dump_symbols_references being too verbose, so it should be fine behind a flag i guess. Could you add a bit of docs about it on Configuration.md?

Copy link
Copy Markdown
Collaborator

@AngheloAlf AngheloAlf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, I like the write up in the Advanced.md docs.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants