Add dump symbols options by Dragorn421 · Pull Request #538 · ethteck/splat

Dragorn421 · 2026-05-19T13:08:07Z

This PR adds information to the splat symbol dump, partially gated behind the new dump_symbols_references option:

  dump_symbols: true
  dump_symbols_references: true

As a reminder, the existing dump_symbols option enables dumping symbols to the following files into .splat/: spim_context.csv, spim_context_unksegment.csv, splat_symbols.csv

The changes add columns to the splat_symbols.csv file.

The following columns are added unconditionally: segment,subsegment,subsegment_type, where

segment is the splat name of the symbol's segment (as defined from the yaml), or None if the symbol is not tied to a segment
subsegment is the splat name of the subsegment the symbol is in, if any, or None
subsegment_type is the type of that subsegment (such as asm, c, data, ...) or None if there's no tied subsegment

The new option dump_symbols_references adds the following column: referenced_by

referenced_by is a |-separated list of symbol names that reference the symbol of the current csv line. For example
- `` (empty string) for an unreferenced symbols
- func_801C77C8 for a symbol referenced by a single function, func_801C77C8
- leoCommand|D_801D95F0|leomain|LeoReset for a symbol referenced by three functions and a data symbol

Use case

These new columns are useful in order to visualize the relationships between symbols, which helps identify how to split asm and rodata sections and how to associate them.

As an example, take the following python script:

Details

#!/usr/bin/env python3

# SPDX-FileCopyrightText: 2026 Dragorn421
# SPDX-License-Identifier: CC0-1.0

import argparse
import csv
import dataclasses


@dataclasses.dataclass(frozen=True)
class Sym:
    vram_start: int
    name: str
    type: str
    segment: str
    subsegment: str
    subsegment_type: str
    referenced_by: tuple[str, ...]


syms = list[Sym]()

with open(".splat/splat_symbols.csv") as f:
    for row in csv.DictReader(f):
        if row["referenced_by"] == "":
            referenced_by = []
        else:
            referenced_by = row["referenced_by"].split("|")
        syms.append(
            Sym(
                int(row["vram_start"], 16),
                row["name"],
                row["type"],
                row["segment"],
                row["subsegment"],
                row["subsegment_type"],
                tuple(referenced_by),
            )
        )

sym_by_name = {_sym.name: _sym for _sym in syms}

parser = argparse.ArgumentParser()
parser.add_argument("segment")
parser.add_argument(
    "--section",
    nargs="+",
    help=(
        "only show this section besides text,"
        " eg --section rodata will only show text and rodata"
    ),
)
args = parser.parse_args()

section_by_subsegment_type = {
    "asm": "text",
    "c": "text",
    "textbin": "text",
    "hasm": "text",
    "data": "data",
    "rodata": "rodata",
    ".rodata": "rodata",
    "bss": "bss",
}

syms_by_section: dict[str, list[Sym]] = {}
for sym in syms:
    if sym.segment != args.segment:
        continue
    section = section_by_subsegment_type.get(sym.subsegment_type)
    assert section is not None, sym
    syms_by_section.setdefault(section, []).append(sym)

text_subsegments = sorted({_sym.subsegment for _sym in syms_by_section["text"]})
color_by_subsegment: dict[str, str] = {}
for subsegment in text_subsegments:
    h = (len(color_by_subsegment) * 0.7) % 1
    color_by_subsegment[subsegment] = f"{h} 1 1"

if args.section:
    for section in list(syms_by_section.keys()):
        if section != "text" and section not in args.section:
            del syms_by_section[section]

section_by_sym_name = {
    _sym.name: _section for _section, _syms in syms_by_section.items() for _sym in _syms
}

vram_start_by_section: dict[str, int] = {}
for section, section_syms in syms_by_section.items():
    vram_start_by_section[section] = min(_s.vram_start for _s in section_syms)


colw = 10
x_by_section = {
    "text": 0 * colw,
    "data": 1 * colw,
    "rodata": 2 * colw,
    "bss": 3 * colw,
}


def gprint(l: str):
    print(l)


gprint("digraph {")

for section, section_syms in syms_by_section.items():
    section_vram_start = vram_start_by_section[section]
    x = x_by_section[section]
    filtered_syms: list[Sym] = []
    for sym in sorted(section_syms, key=lambda sym: sym.vram_start):
        if sym.type in {"label", "jtbl_label"}:
            continue
        filtered_syms.append(sym)
    cur_subsegment = None
    i = 0
    dy = 0
    for sym in filtered_syms:
        if cur_subsegment != sym.subsegment:
            if cur_subsegment is not None:
                gprint("}")
            cur_subsegment = sym.subsegment
            gprint(f"subgraph cluster_{cur_subsegment}_{section} " "{")
            y = -i / len(filtered_syms) * 100 + dy - 0.2
            gprint(f'"{cur_subsegment} {section}"' " [" f' pos = "{x},{y}!"' f' color="none"' " ]")
            dy -= 0.8
        assert cur_subsegment is not None
        if 0:
            # y = vram position
            y = -(sym.vram_start - section_vram_start) / 500
        y = -i / len(filtered_syms) * 100 + dy
        i += 1
        color = None
        if section == "text":
            color = color_by_subsegment[cur_subsegment]
        elif section == "rodata":
            if sym.type == "jtbl":
                color = "magenta"
        gprint(
            f'"{sym.name}"'
            " ["
            f' pos = "{x},{y}!"'
            + (f' color="{color}"' if color is not None else "")
            + " ]"
        )
    if cur_subsegment is not None:
        gprint("}")

for section, section_syms in syms_by_section.items():
    for sym in section_syms:
        for sym_ref_by in sym.referenced_by:
            if (
                # ignore references from outside the segment
                sym_ref_by in section_by_sym_name
                # ignore same-section references
                and section_by_sym_name[sym_ref_by] != section
                # only show
                and (
                    # references from text
                    section_by_sym_name[sym_ref_by] == "text"
                    # or references from data to rodata
                    or (
                        section_by_sym_name[sym_ref_by] == "data"
                        and section_by_subsegment_type[sym.subsegment_type] == "rodata"
                    )
                )
            ):
                try:
                    color = color_by_subsegment[sym_by_name[sym_ref_by].subsegment]
                except KeyError:
                    color = "black"
                gprint(f'"{sym_ref_by}" -> "{sym.name}"' f' [ color = "{color}" ]')

gprint("}")

This script reads the contents of .splat/splat_symbols.csv, and given a segment name will output a graph in dot language of the symbols of that segment and their relationships. The symbols will also be visually clustered by subsegment. Optionally, the script can take a --section section1 [section2 ...] argument to restrict the visualization to just text and the given sections (text, data, rodata, bss)

As an example, here is the result of

./graph_cross_sections_refs.py n64dd --section rodata > n64dd.dot && neato -Tsvg -O n64dd.dot

(using graphviz' neato to render from dot language to svg)

n64dd.dot.svg:

Details

We can see two columns: the left column is the text section, and the right column is the rodata section. (since the script received --section rodata, only text and rodata are shown)

text symbols and outgoing references (represented by the arrows) are colored differently per subsegment.

Subsegments are also clustered and highlighted by black rectangles, and the name of each subsegment can be seen at their top

AngheloAlf · 2026-05-19T22:46:51Z

I think we should get rid of dump_symbols_segments and just make it part of the default behavior.

I can see dump_symbols_references being too verbose, so it should be fine behind a flag i guess. Could you add a bit of docs about it on Configuration.md?

…Advanced.md

AngheloAlf

Ohh, I like the write up in the Advanced.md docs.

Thanks!

Dragorn421 added 3 commits May 19, 2026 14:39

Add dump_symbols_segments and dump_symbols_references options

4236b5d

add subsegment_type column

ff09f99

fix changelog

70dad64

Dragorn421 added 4 commits May 21, 2026 05:51

yeet dump_symbols_segments option

b8dbfe4

add docs to docs/Configuration.md

43a150d

docs: Add "Visualizing the relationships between symbols" section to …

3a5f9a8

…Advanced.md

fixup Advanced.md

484c74d

AngheloAlf approved these changes May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dump symbols options#538

Add dump symbols options#538
Dragorn421 wants to merge 7 commits into
ethteck:mainfrom
Dragorn421:add_dump_symbols_options_w_rework_bss_size

Dragorn421 commented May 19, 2026 •

edited

Loading

Uh oh!

AngheloAlf commented May 19, 2026

Uh oh!

AngheloAlf left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Dragorn421 commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Use case

Uh oh!

AngheloAlf commented May 19, 2026

Uh oh!

AngheloAlf left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Dragorn421 commented May 19, 2026 •

edited

Loading