Fix Boltz-2 pLDDT/PAE indexing for residue-level scores #24

shahryary · 2025-12-08T14:33:07Z

This PR fixes two issues when running ipsae.py on Boltz-2 structures:

plddt_AAAAA_model_0.npz and pae_AAAAA_model_0.npz were indexed with
token_array.astype(bool), which assumes a 1:1 correspondence between the
Boltz-1 pLDDT/PAE vectors and the token_mask built from the mmCIF
_atom_site table. For some Boltz-2 outputs this is not true and leads to:
- IndexError in the Boltz block:
  IndexError: index XXX is out of bounds for axis 0 with size Y
- A second IndexError later in the pDockQ calculation:
  mean_plddt = cb_plddt[list(pDockQ_unique_residues[chain1][chain2])].mean()
cb_plddt could end up having a length different from the number of scored
residues (numres), while downstream code assumes residue-level arrays of
length numres (e.g. for pDockQ, ipSAE by residue).

What this change does

For boltz-1/boltz-2 inputs we now:

Load plddt from plddt_*.npz, scale it to 0–100, and then:
- If len(plddt) >= max(CA_atom_num)+1, treat it as per-atom and build
  residue-level plddt / cb_plddt using CA_atom_num / CB_atom_num
  (same strategy as the AF3 code path).
- If len(plddt) == numres, treat it as per-residue and use it directly.
- Otherwise, fall back to truncating/padding to numres with a warning so
  that downstream calculations never hit an out-of-bounds error.
Load pae from pae_*.npz and ensure pae_matrix is (numres, numres):
- If the matrix is larger, truncate to [:numres, :numres].
- If it is exactly numres x numres, use it as-is.
- Otherwise, emit a warning and use the raw matrix.

This makes sure that all residue-level arrays (plddt, cb_plddt,
pae_matrix) are consistent with the rest of the script, and fixes the
Boltz-2 crashes I was seeing in practice.

Manual testing

Ran ipsae.py on Boltz-2 outputs (structure .cif, plddt_*.npz,
pae_*.npz, confidence_*.json) where the previous version raised:
- IndexError: index 604 is out of bounds for axis 0 with size 604
- IndexError: index 600 is out of bounds for axis 0 with size 600
With this patch, ipsae.py completes successfully and produces scores for
all chain pairs (including pDockQ, pDockQ2, LIS and the various ipSAE
variants).

There are no changes to AF2/AF3 paths.

dunbrack · 2026-01-03T09:18:21Z

I think we have to figure out why the sizes of the vectors/matrices do not agree. Just taking the min of each may misalign the data. The different programs handle ligands and modified amino acids differently, and we need one plddt per protein residue (not per token if modified residue has multiple tokens). Same for PAE pair. I'm working on it. IF you have examples where it crashes, let me know. roland.dunbrack@fccc.edu.

Fix Boltz-1 pLDDT/PAE indexing for residue-level scores

ee985a1

dunbrack closed this Jan 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Boltz-2 pLDDT/PAE indexing for residue-level scores #24

Fix Boltz-2 pLDDT/PAE indexing for residue-level scores #24

Uh oh!

shahryary commented Dec 8, 2025

Uh oh!

dunbrack commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix Boltz-2 pLDDT/PAE indexing for residue-level scores #24

Fix Boltz-2 pLDDT/PAE indexing for residue-level scores #24

Uh oh!

Conversation

shahryary commented Dec 8, 2025

What this change does

Manual testing

Uh oh!

dunbrack commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants