text2sg — Political Relation Extraction as Signed Graphs

Turn Spanish-language news articles into structured signed graphs of political relations using any LLM backend (Gemini, Claude, GPT, or local Ollama).

"Boric respaldó las propuestas de Vallejo,           ┌─────────────┐
 mientras que Kast las calificó de 'peligrosas'."    │  endorses + │
                                              Boric ──┤             ├──▶ Vallejo
                                                      └─────────────┘
                                                      ┌─────────────┐
                                                      │  attacks  − │
                                              Kast ───┤             ├──▶ Vallejo
                                                      └─────────────┘

Each extracted relation has: from_entity, to_entity, act_type, polarity, evidence_quote, confidence.

Why signed graphs?

Most political NLP stops at sentiment or named entities. Signed graphs encode who does what to whom — the minimal structure needed to detect alliances, oppositions, and polarization dynamics over time. This tool was built to feed a longitudinal network analysis of Chilean political elites across 2.1M news articles.

Install

pip install -e ".[gemini]"     # Gemini (default)
pip install -e ".[claude]"     # Claude
pip install -e ".[openai]"     # GPT
pip install -e ".[ollama]"     # local (Ollama)
pip install -e ".[all]"        # all backends + dev deps

Set your API key:

export GEMINI_API_KEY=...      # or ANTHROPIC_API_KEY / OPENAI_API_KEY

Quick start

from text2sg.genome import Genome
from text2sg.extractor import run_extraction
from text2sg.llm_backends import GeminiClient

# Load the champion genome (prompt + validation config + analysis config)
genome = Genome.from_json("genomes/id15_champion.json")

article = {
    "body": "Boric respaldó las propuestas de Vallejo, mientras que Kast las calificó de peligrosas.",
    "unions": ["Gabriel Boric", "Camila Vallejo", "José Antonio Kast"],  # known actors
}

client = GeminiClient()
result = run_extraction(article, genome, client, model="gemini-2.0-flash-lite")

for rel in result["relations"]:
    print(f"{rel['from_entity']} --[{rel['act_type']}]--> {rel['to_entity']}")
# Gabriel Boric --[endorses]--> Camila Vallejo
# José Antonio Kast --[attacks]--> Camila Vallejo

No actors pre-computed? Use end-to-end mode (NER + extraction in one call):

result = run_extraction(article_without_unions, genome, client,
                        model="gemini-2.0-flash-lite", end2end=True)

Genome: the three-artifact design

A genome encodes the full extraction strategy as three independently mutable artifacts:

Artifact	What it controls	Cost
A — `prompt_text`	The LLM extraction prompt	per-call
B — `ValidationConfig`	Deterministic post-processing filters	$0
C — `AnalysisConfig`	Pre-extraction scaffolding (actor dossier, alias map, …)	$0

{
  "prompt_text": "Extract political relations...",
  "validation": {
    "require_evidence_substring": true,
    "min_quote_len": 8,
    "allowed_act_types": ["endorses", "attacks", "allies_with", ...]
  },
  "analysis": {
    "actor_dossier": true,
    "alias_map": true,
    "direction_scaffold": false
  }
}

Artifacts B and C cost nothing to evaluate — changes to them are screened before spending any API budget. This design enables the evolutionary optimizer to iterate cheaply.

Benchmark — Synthetic Oracle v2

Evaluated on 287 articles, 914 gold relations (truth planted by claude-opus-4-8, difficulty 1–10, stratified by domain × register).

Model	Precision	Recall	F0.5
gemini-2.0-flash-lite (id15)	0.940	0.884	0.928
gemini-2.0-flash-lite (id13)	0.928	0.901	0.922
gemini-2.0-flash-lite (id18)	0.927	0.905	0.921

F0.5 weights precision 2× over recall — right for downstream graph analysis where false edges corrupt community structure more than missed edges.

The benchmark uses a fully synthetic oracle (no real corpus required): articles are generated with planted ground truth, so evaluation is 100% honest and reproducible without any data-license restrictions.

Supported relation types

`act_type`	Polarity	Example
`endorses`	+	"Boric respaldó la propuesta"
`attacks`	−	"Kast calificó de peligrosas"
`allies_with`	+	"firmaron un acuerdo conjunto"
`calls_on`	neutral	"exigió al ministro que"
`distances_from`	−	"se desmarcó de la postura"
`questions`	−	"cuestionó la decisión de"
`negotiates_with`	neutral	"negoció con la oposición"
`competes_with`	−	"compite directamente contra"
`accuses`	−	"acusó de corrupción a"

Multi-backend support

from text2sg.llm_backends import GeminiClient, AnthropicClient, OpenAIClient, OllamaClient

# Gemini (default, cheapest at scale)
client = GeminiClient()

# Claude — strong on formal political text
client = AnthropicClient()

# GPT
client = OpenAIClient()

# Local — zero API cost, useful for development
client = OllamaClient()   # requires: ollama serve && ollama pull qwen2.5:7b

All backends expose the same interface — swap without changing extraction code.

Project structure

text2sg/
├── extractor.py      — build_prompt + LLM call + parsing + validation
├── genome.py         — Genome dataclass (A + B + C) with JSON serialization
├── validation.py     — Artifact B: deterministic post-processing ($0)
├── analysis.py       — Artifact C: pre-extraction scaffolding ($0)
├── rubric.py         — precision/recall metrics (directed + undirected)
├── llm_backends.py   — Gemini / Claude / GPT / Ollama (same interface)
├── config.py         — base config + seed prompt
└── prompts/          — system prompts for meta-agents (diagnose/propose/cross/fresh)
tests/                — 100+ tests, all run offline without API key

How the champion genome was found

The champion genome was discovered by text2graph-evolve, a multi-agent evolutionary optimizer that runs on top of this package. The optimizer treats the genome as a three-artifact individual, evaluates mutations against the synthetic oracle, and manages a Pareto front across precision/recall gradients using a panel of LLM meta-agents (diagnose, propose, cross-pollinate, fresh-design).

Citation

If you use this tool in academic work:

@software{text2sg2026,
  author  = {Palacios, Benjamin},
  title   = {text2sg: Political Relation Extraction as Signed Graphs},
  year    = {2026},
  url     = {https://github.com/bpalas/text2SG}
}

License

MIT — see LICENSE. The synthetic oracle and champion genomes are included. Real corpus data (IMFD) is not redistributed.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
tests		tests
text2sg		text2sg
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

text2sg — Political Relation Extraction as Signed Graphs

Why signed graphs?

Install

Quick start

Genome: the three-artifact design

Benchmark — Synthetic Oracle v2

Supported relation types

Multi-backend support

Project structure

How the champion genome was found

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

text2sg — Political Relation Extraction as Signed Graphs

Why signed graphs?

Install

Quick start

Genome: the three-artifact design

Benchmark — Synthetic Oracle v2

Supported relation types

Multi-backend support

Project structure

How the champion genome was found

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages