Skip to content

fix: count parameters shared across modules once (#322, #377, #358, #303)#397

Merged
TylerYep merged 3 commits into
TylerYep:mainfrom
Mikyx-1:fix/issue-322
Jun 15, 2026
Merged

fix: count parameters shared across modules once (#322, #377, #358, #303)#397
TylerYep merged 3 commits into
TylerYep:mainfrom
Mikyx-1:fix/issue-322

Conversation

@Mikyx-1

@Mikyx-1 Mikyx-1 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes parameter over-counting for any model that shares parameters, so summary(...).total_params always equals sum(p.numel() for p in model.parameters()) — what PyTorch itself reports.

Important

Stacked on #396 (fix for #327). This branch contains the #327 commit as its base, so the diff currently shows both commits. Please merge #396 first; afterwards this PR's diff reduces to just the second commit (fix: count parameters shared across modules once). The two fixes are complementary and only together resolve all the linked issues — see below.

The two complementary bugs

Shared parameters reach torchinfo two ways, and each was counted multiple times:

Mechanism Fixed by
A One module instance reused under several parents (e.g. YOLO detection layers) #327 / #396
B One parameter tensor referenced by distinct modules — weight tying: tied embeddings / lm_head, shared projection heads (T5, SD, LLMs) this PR

#327 dedups by id(module), so case B slips through (tied tensors live on different module objects). This PR closes case B.

Root cause (B)

total_params was built bottom-up by summing each row's num_params, with no dedup across parameter tensors. A tensor referenced by N modules was added N times.

Fix

Take parameter totals from the root module. Module.named_parameters() already deduplicates shared tensors (remove_duplicate=True) and includes submodules not run in the forward pass, so this matches sum(p.numel() for p in model.parameters()) exactly. A module whose parameters were all already counted by an earlier row is marked (recursive), so the per-row column still sums to the total.

Issues resolved (verified)

Issue Symptom Result on this branch
#322 overcount 1.78–2.37× (YOLO, T5, SD) ✅ YOLOv8n 5,257,936 → 3,157,200; YOLOv10n 4,932,416 → 2,775,520 (matches model.info())
#377 tied embeddings / lm_head double-counted ✅ counted once by default
#358 flan-t5-small reports 128M, expected 76,961,152 76,961,152 (exact)
#303 flan-t5-small reports 109M, expected ~77M 76,961,152 (= model.parameters())

YOLO is fixed by the #327 half (module reuse); the tied-weight cases by this half. Both are needed.

Tests

  • New TiedWeightsModel fixture + test_tied_weights regression test.
  • flan_t5_small.out snapshot regenerated (Python 3.14): total 93,410,68876,961,152, decoder embedding row now correctly (recursive).
  • Full suite green (snapshots regenerated under Python 3.14 to match CI); zero regressions.

🤖 Generated with Claude Code

Mikyx-1 and others added 3 commits June 7, 2026 16:44
…Yep#377)

torchinfo built total_params by summing each row's num_params, with no
deduplication across parameter tensors. When one tensor is referenced by
multiple distinct modules (weight tying -- tied embeddings / lm_head,
shared projection heads, etc.) it was counted once per referencing
module, overestimating the total (e.g. flan-t5-small reported
93,410,688 vs the true 76,961,152).

This differs from the module-instance sharing fixed in TylerYep#327: tied tensors
live on different module objects, so id(module)-based recursion detection
doesn't catch them.

Take parameter totals from the root module instead. A module's
named_parameters() already deduplicates shared tensors (remove_duplicate
defaults to True) and includes submodules not run in the forward pass, so
this matches `sum(p.numel() for p in model.parameters())`. A module whose
parameters were all already counted by an earlier row is marked
"(recursive)" so the per-row counts still sum to the total.

Add TiedWeightsModel + test_tied_weights, and regenerate the flan-t5
snapshot (Python 3.14).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@TylerYep

Copy link
Copy Markdown
Owner

Looks good, thanks for the fix!

@TylerYep TylerYep merged commit 7abc762 into TylerYep:main Jun 15, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants