Skip to content

LigandPro/hedgehog

Repository files navigation

HEDGEHOG

HEDGEHOG: Hierarchical Evaluation of Drug GEnerators tHrOugh riGorous filtration.

PyPI Docs CI License: MIT Python 3.10+

HEDGEHOG Terminal UI

Quick Start

HEDGEHOG is a stage-based molecular design evaluation pipeline for:

  • molecule preparation
  • descriptor calculation
  • structural filtering
  • retrosynthesis filtering
  • docking
  • docking pose filtering
  • final reports

The full pipeline can require optional external tools and receptor inputs. Start with the safe smoke run below to verify the Python environment, bundled example molecules, descriptor calculation, and structural filters before enabling retrosynthesis or docking.

Recommended install: source checkout

git clone https://github.com/LigandPro/hedgehog.git
cd hedgehog
uv sync

This is the recommended way to run HEDGEHOG end to end. The repository checkout contains the editable configs, bundled examples, TUI sources, and the modules/ workspace used to store optional tool assets such as AiZynthFinder public data.

Requirements:

  • Python 3.10+
  • uv
  • optional: Node.js >= 18 and npm for the TUI
  • optional: AiZynthFinder for retrosynthesis
  • optional: GNINA, SMINA, or Matcha for docking

PyPI install

python -m pip install hedgehog
hedgehog --help

Use the PyPI package only if you already manage your own config files and input paths. The default quick start, hedgehog setup ... workflows, and TUI usage are designed around a source checkout.

First safe run

uv run hedgehog --stage descriptors --stage struct_filters --force-new

This avoids docking and retrosynthesis. Use it as the first validation that the local environment, bundled examples, descriptor calculation, and structural filters are working.

Full pipeline

uv run hedgehog setup aizynthfinder
uv run hedgehog --auto-install

Full pipeline execution may require AiZynthFinder, GNINA/SMINA/Matcha, valid receptor structures, reference ligands, and enough CPU/GPU resources.

Input Format

Recommended molecule input is CSV/TSV with a smiles header:

smiles,model_name
CCO,demo
CCN,demo
c1ccccc1,demo

Required:

  • smiles

Optional:

  • model_name or name
  • mol_idx

If mol_idx is missing, HEDGEHOG assigns a stable ID and uses it to join stage outputs, docking scores, and report data.

Common Commands

# Safe smoke run
uv run hedgehog --stage descriptors --stage struct_filters --force-new

# Full pipeline after optional tools are available
uv run hedgehog --auto-install

# Run with your own molecules
uv run hedgehog --mols input/my_molecules.csv

# Run a single stage
uv run hedgehog --stage descriptors

# Run multiple selected stages
uv run hedgehog --stage descriptors --stage struct_filters

# Run docking with a live progress bar
uv run hedgehog --stage docking --progress

# Run docking without progress bar (default)
uv run hedgehog --stage docking

# Regenerate report for an existing run
uv run hedgehog report results/run_10

# Show stages / version
uv run hedgehog info
uv run hedgehog version

# Launch terminal UI
uv run hedgehog tui

Progress bar behavior in CLI runs:

  • Enabled: add --progress
  • Disabled: omit --progress (default)

Results

Results are written under the configured output directory, usually as an auto-numbered run folder:

results/run_N/
├── stages/
├── output/
└── report.html

Benchmark Results

Filtering pass rates by model class

Percentages are computed relative to the initial set for each model class. Unconditional and protein-based models each start from 80,000 molecules, and ligand-based models start from 70,000 molecules.

Stage / Pass Rate Unconditional Ligand-based Protein-based
#mols% #mols% #mols%
Initial 80,000100 70,000100 80,000100
Preprocessing / Init 60,40775.51 68,85898.37 77,39696.75
Descriptors / Init 19,94124.93 19,97828.54 19,41224.27
Structural Filters / Init 4,6525.82 4,1325.90 2,8963.62
Synthesis Feasibility / Init 2,7783.47 1,4832.12 1,3161.65
Docking & Binding Aff. / Init 1,4411.80 1,0841.55 7680.96
3D Filters / Init 6090.76 3960.57 4850.61

Top generators by final pass count

Best-performing generators within each model class, ranked by the number of molecules that pass the full HEDGEHOG pipeline.

RankUnconditionalLigand-basedProtein-based
GeneratorFinal GeneratorFinal GeneratorFinal
1 REINVENT4163 REINVENT4 (V)182 Dragonfly345
2 JT-VAE148 MolFinder87 DrugFlow70
3 MoLeR116 REINVENT4 (TL)72 ProtoBind-Diff35
4 HierGraphVAE108 GENTRL25 Pocket2Mol25
5 MolGPT69 REINVENT4 (P)21 ResGen10
6 TGM-DLM4 GCPG8 DiffSBDD0
7 ShEPhERD1 PGMG1 Dragonfly (b)0
8 E(3)DM0 TargetDiff0

Documentation

HEDGEHOG Pipeline

For full details, use the documentation instead of this README:

To run the docs site locally:

cd docs
pnpm install
pnpm dev

License

MIT