Skip to content

Conversation

@shahryary
Copy link

This PR fixes two issues when running ipsae.py on Boltz-2 structures:

  1. plddt_AAAAA_model_0.npz and pae_AAAAA_model_0.npz were indexed with
    token_array.astype(bool), which assumes a 1:1 correspondence between the
    Boltz-1 pLDDT/PAE vectors and the token_mask built from the mmCIF
    _atom_site table. For some Boltz-2 outputs this is not true and leads to:

    • IndexError in the Boltz block:
      IndexError: index XXX is out of bounds for axis 0 with size Y
    • A second IndexError later in the pDockQ calculation:
      mean_plddt = cb_plddt[list(pDockQ_unique_residues[chain1][chain2])].mean()
  2. cb_plddt could end up having a length different from the number of scored
    residues (numres), while downstream code assumes residue-level arrays of
    length numres (e.g. for pDockQ, ipSAE by residue).

What this change does

For boltz-1/boltz-2 inputs we now:

  • Load plddt from plddt_*.npz, scale it to 0–100, and then:

    • If len(plddt) >= max(CA_atom_num)+1, treat it as per-atom and build
      residue-level plddt / cb_plddt using CA_atom_num / CB_atom_num
      (same strategy as the AF3 code path).
    • If len(plddt) == numres, treat it as per-residue and use it directly.
    • Otherwise, fall back to truncating/padding to numres with a warning so
      that downstream calculations never hit an out-of-bounds error.
  • Load pae from pae_*.npz and ensure pae_matrix is (numres, numres):

    • If the matrix is larger, truncate to [:numres, :numres].
    • If it is exactly numres x numres, use it as-is.
    • Otherwise, emit a warning and use the raw matrix.

This makes sure that all residue-level arrays (plddt, cb_plddt,
pae_matrix) are consistent with the rest of the script, and fixes the
Boltz-2 crashes I was seeing in practice.

Manual testing

  • Ran ipsae.py on Boltz-2 outputs (structure .cif, plddt_*.npz,
    pae_*.npz, confidence_*.json) where the previous version raised:

    • IndexError: index 604 is out of bounds for axis 0 with size 604
    • IndexError: index 600 is out of bounds for axis 0 with size 600

    With this patch, ipsae.py completes successfully and produces scores for
    all chain pairs (including pDockQ, pDockQ2, LIS and the various ipSAE
    variants).

There are no changes to AF2/AF3 paths.

@dunbrack
Copy link
Member

dunbrack commented Jan 3, 2026

I think we have to figure out why the sizes of the vectors/matrices do not agree. Just taking the min of each may misalign the data. The different programs handle ligands and modified amino acids differently, and we need one plddt per protein residue (not per token if modified residue has multiple tokens). Same for PAE pair. I'm working on it. IF you have examples where it crashes, let me know. roland.dunbrack@fccc.edu.

@dunbrack dunbrack closed this Jan 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants