Dongxin Guo bettyguo

Dongxin (Betty) Guo

Final-year PhD @ HKU CS · Hong Kong
I prove the architectural limits of LLM reasoning, and build the systems that route around them.

About

I'm a final-year PhD candidate in the Department of Computer Science at The University of Hong Kong, advised by Prof. Siu-Ming Yiu. My research sits at the intersection of three threads that keep refusing to be separate:

What transformers can actually reason about. Tight architectural bounds, plus the tool-delegation systems those bounds force you to build.
Trustworthy LLMs in regulated settings. Compliance-grade explainability, distribution-free coverage, atomic claim verification.
Serving infrastructure that respects both. Workflow-atomic GPU scheduling with per-tenant fairness guarantees.

Theorems tell you what cannot be done. Systems make precise what can.

The cycle runs both ways: deployment surfaces the limits worth proving, and the proofs become the constraints that keep deployment honest.

🎉 News

[05.2026] 🎉 Accepted to TMLR: Tight Bounds and Fundamental Impossibility for Knowledge Editing Side Effects in Transformers. Computable bounds on edit side effects, plus an impossibility theorem ruling out perfect locality and generalization at once.
[05.2026] 🎉🎉🎉 Two papers accepted to ICML 2026: The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary (Main) and Current XAI Methods Cannot Satisfy Financial AI Explainability Requirements (Position Track).
[05.2026] ✨ On the postdoc market for Fall 2026. Trustworthy / compliance-grade AI, multi-agent systems & mechanism design, LLM theory, and serving systems. Reach out at bettyguo@connect.hku.hk.
[05.2026] 📝 Serving as a reviewer for NeurIPS, EMNLP, ACM Multimedia (Main & Dataset Tracks), and UAI.
[04.2026] 🏆🏆🏆 Four papers accepted to ACL 2026 Industry Track: FinGround (atomic claim verification), RouteNLP (conformal LLM routing), AgentEval (DAG-structured agent evaluation), and ComplianceNLP (KG-augmented regulatory gap detection).
[04.2026] 🚀🚀🚀 SAGA accepted to HPDC 2026. It's a workflow-atomic scheduler for AI agent inference on GPU clusters, with per-tenant fairness guarantees that hold under real multi-tenant load.
[03.2026] 📣 Adaptive Retrieval for Large Reasoning Models accepted to SIGIR 2026. When to retrieve during reasoning, with bounds, not heuristics.
[02.2026] 💼 Conformal-bound risk management at Brain Investing is now running against live P&L. That's our HKU FinTech spin-out, and the lab's coverage work has finally made it onto a real trading book.
[01.2026] 🛠️ Shipped multi-tenant scheduling and conformal-coverage pipelines at Stellaris AI for native-safe foundation-model deployment in regulated industries.
[09.2025] 🎓 Began the final year of PhD at HKU CS, advised by Prof. Siu-Ming Yiu. Thesis focuses on the theory-meets-deployment cycle: bounds on transformer reasoning, and the systems those bounds force.
[08.2025] 🏅 Continuing Cyberport Incubation (2023–2025 intake). That keeps an unbroken 2018–2025 funding run going across TSSSU, HKSTP Incu-Tech, HKU iAXON Deep Tech, and Cyberport.

Theory. Production. Curation.

Nine ICML / SIGIR / ACL / HPDC / TMLR papers this cycle. Five candidate post-Transformer architectures under test. A retrieval method at 117 stars. Conformal-bound risk on a live trading book. Built across HKU CS, Stellaris AI, and Brain Investing.

⚡ At a Glance

9

papers, 2026 cycle
_{ICML × 2 · SIGIR · HPDC
TMLR · ACL Industry × 4}

85+

original OSS repos
_{architectures · research · agents
benchmarks · tools · curation}

2

in production
_{Stellaris AI · Brain Investing
conformal-bound risk, live P&L}

10

years of funding
_{TSSSU · HKSTP · Cyberport ×2
iAXON · continuous since 2018}

🌟 Showcase

Four projects worth a second look:

📈 `realm-retrieve`

ReaLM-Retrieve · SIGIR 2026. When to retrieve during reasoning, with bounds rather than heuristics. Highest-cloned repo in this account.

_{Python · ⭐ 117 · 🍴 13 · breakout}

🚀 `SAGA`

HPDC 2026. Workflow-atomic GPU-cluster scheduler for AI agents. Within 1.31× of Bélády-optimal KV-cache eviction, with OpenMP-accelerated C++ kernels and LangChain / AutoGen / CrewAI bridges.

_{Python C++ · concrete-metric flagship}

🧬 `Vannevar`

Open-source agentic harness with citation-grade memory: source URI, temporal validity window, append-only provenance ledger. MCP-native, multi-frontend, fully self-hostable.

_{Rust · flagship infrastructure}

🧪 `research-prototypes`

Five post-Transformer architecture candidates: CASCADE, CHIMERA, HELIX, MNEMOSYNE, NOESIS. Per-token routing, tokenizer-free byte models, and latent-space continuous-thought reasoning.

_{Python · frontier research program}

📚 Selected Publications

Paper	Venue	Code
Tight Bounds and Fundamental Impossibility for Knowledge Editing Side Effects in Transformers		`ke-bounds`
The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary		`deterministic-horizon`
When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models		`realm-retrieve`
Current XAI Methods Cannot Satisfy Financial AI Explainability Requirements		_{position paper}
SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters		`SAGA`
FinGround: Atomic Claim Verification for Financial LLM Outputs		`FinGround`
ComplianceNLP: KG-Augmented Regulatory Gap Detection		`ComplianceNLP`
RouteNLP: Conformal LLM Routing		`RouteNLP`
AgentEval: DAG-Structured Agent Evaluation		`AgentEval`

_{Full publication list, PDFs, and BibTeX at bettyguo.github.io.}

🧭 Research Threads

Three lines that keep crossing in our papers. Each thread proves a bound and ships the system that meets it.

🧠 Reasoning & tool use

What softmax attention can realize at inference time, and what it provably cannot. The matching upper and lower bounds become the spec for the tool-delegation layer above them.

_{📄 The Deterministic Horizon · · Adaptive Retrieval for Large Reasoning Models · · code: deterministic-horizon, realm-retrieve}

🛡️ Trustworthy LLMs for regulated settings

Explainability and verification that survive financial-services audit, not benchmark conditions. Distribution-free coverage, atomic claim verification, knowledge-graph-augmented regulatory gap detection, and provable bounds on knowledge-editing side effects.

_{📄 Tight Bounds and Fundamental Impossibility for Knowledge Editing Side Effects in Transformers · · Current XAI Methods Cannot Satisfy Financial AI Explainability Requirements · · FinGround, ComplianceNLP · · code: ke-bounds, FinGround, ComplianceNLP, TrustKGRAG}

⚡ Serving & agent infrastructure

Workflow-atomic GPU scheduling with per-tenant fairness guarantees that hold under real multi-tenant load. DAG-structured evaluation harnesses and conformal routing for agent cascades.

_{📄 SAGA · · RouteNLP, AgentEval · · code: SAGA, RouteNLP, AgentEval}

📐 Method, in four habits

How we approach problems, across every thread:

Tight bounds with explicit constants. Upper and lower bounds in the same paper. No asymptotic hand-waving.
Impossibility paired with construction. When a thing can't be done, that result becomes a design constraint, not a stopping point.
Guarantees that survive reality. Distribution-free coverage, conformal prediction, fair scheduling. No idealized assumptions.
Theory and the system that meets it, shipped together. The proof tells the algorithm what to achieve; the algorithm tells the proof what's worth bounding.

"Theorems tell you what cannot be done. Systems make precise what can."

🗂️ What lives in this account

85+ original public repos. Research code behind every paper, the architecture program we are betting on next, and the developer infrastructure our team relies on every day across HKU CS, Stellaris AI, and Brain Investing.
_{Browse the full index → github.com/bettyguo?tab=repositories}

🧬 5 _{architectures}	🔬 22 _research	🔌 8 _{MCP servers}	🤖 6 _{agent systems}
🧪 8 _{eval & safety}	🔭 6 _{interpretability}	🛠️ 9 _{dev tools}	📚 14 _{curated maps}

🧬 Post-Transformer architectures

_{An exploratory program: candidate sequence architectures beyond attention.}

Repo	What it is
`research-prototypes`	The program. Five post-Transformer candidates evaluated head to head. Each now has its own repo below.
`cascade-lm`	CASCADE. Cascaded, multi-stage sequence processing.
`chimera-lm`	CHIMERA. Per-token learned routing across SSM, sliding-window, and full attention.
`helix-lm`	HELIX. Tokenizer-free, byte-level hierarchical entropy-linked information exchange.
`mnemosyne-lm`	MNEMOSYNE. Memory-centric sequence architecture.
`noesis-lm`	NOESIS. Continuous-thought reasoning LM that thinks in latent space and allocates its own thinking budget.

🔬 Research code

_{One repo per paper. Theory and the system that meets it, in the same artifact.}

Reasoning, retrieval & serving

Repo	What it is
`deterministic-horizon`	ICML '26 companion. Bounds on extended reasoning, and the regime where tool delegation becomes necessary. Explicit constants.
`realm-retrieve`	ReaLM-Retrieve · SIGIR '26 companion. When to retrieve during reasoning, with bounds rather than heuristics. 117 stars.
`SAGA`	HPDC '26 companion. Workflow-atomic GPU-cluster scheduler. Within 1.31× of Bélády-optimal KV-cache eviction, with OpenMP-accelerated C++ kernels and LangChain / AutoGen / CrewAI bridges.
`RouteNLP`	ACL '26 Industry companion. Conformal-coverage router for LLM cascade serving.
`AgentEval`	ACL '26 Industry companion. DAG-structured evaluation harness for multi-step agents.

Trustworthy & regulated AI

Repo	What it is
`ke-bounds`	TMLR '26 companion. Computable bounds on knowledge-editing side effects, plus the impossibility result ruling out perfect locality and generalization at once.
`FinGround`	ACL '26 Industry companion. Three-stage verify-then-ground pipeline for financial document QA that detects and mitigates hallucinations.
`ComplianceNLP`	ACL '26 Industry companion. KG-augmented regulatory gap detection.
`TrustKGRAG`	Probabilistic certified robustness and anomaly detection against knowledge-graph poisoning in RAG.
`conformalized-neural-operators`	Distribution-free, spatially adaptive UQ for neural-operator PDE surrogates via physics-informed conformal prediction.
`VerBPM`	Temporal-logic framework for formal verification and repair of LLM-generated business process models.
`NeSyDisc`	Neuro-symbolic declarative process discovery with consistency guarantees.

Learning theory & systems

Repo	What it is
`SafeAnchor`	Safety-preserving continual domain adaptation of LLMs via Fisher-based subspace identification and orthogonal gradient projection.
`SigGate-GT`	Sigmoid-gated attention for graph transformers. Eliminates over-smoothing and stabilizes training via element-wise output gating.
`pac-learned-index`	PAC learning with tight VC-dimension bounds and provable sample-complexity guarantees for learned database indexes.
`JoinPAC`	PAC learnability for join cardinality estimation. Decomposition bounds, drift detection, hybrid-estimation guarantees.
`AdaptQO`	Structure-aware bandit optimization for learned query hints, with semi-bandit feedback, monotone pruning, and predictive convergence guarantees.
`neural-precond-spectral`	Spectral-equivalence theory with mesh-independent convergence bounds for neural-operator preconditioning of PDE systems.

LLM science: behavior, collapse & brains

Repo	What it is
`iterated-collapse`	Discriminative tests of iterated-learning predictions for LLM model collapse: non-monotonic compositionality, cross-linguistic regularization, the compression and communication tradeoff.
`llm-statistical-preemption`	Causal and correlational evidence for statistical preemption in LLMs, dissociating negative-knowledge acquisition from entrenchment across English verb-construction alternations.
`cross-lingual-brain-llm`	Cross-lingual alignment between brain activity and LLM representations.
`sae-brain-topography`	Sparse-autoencoder decomposition of brain–LLM alignment with a priori cortical semantic topography mapping.

🔌 MCP servers

_{Eight live integrations across our research workflow: code, data, papers, knowledge bases.}

Repo	What it is	Lang
`mcp-gateway`	Any OpenAPI 3.x spec into a Model Context Protocol server. Auth, rate-limiting, OpenTelemetry baked in.
`mcp-postgres`	Postgres MCP server for agents. Four-tier safety: role grants, pglast AST guard, per-tx envelope, audit log. Schema introspection, EXPLAIN analysis, pgvector. PG 13 to 17.
`mcp-jupyter`	MCP server for Jupyter. Live kernel state (variables, dataframes, plots, tracebacks) instead of just the `.ipynb` JSON.
`mcp-wandb-2`	Analytical MCP server for Weights & Biases: hparam importance, sweep summaries, run-delta analysis, inline charts, gated Launch actions.
`paperbase-mcp`	Research-grade MCP composing arXiv, Semantic Scholar, and OpenAlex. Related work, citation graphs, BibTeX in your chat.
`mcp-overleaf`	MCP server and Skills bundle for finishing a LaTeX paper: bib cleanup, venue rule packs, latexdiff, related-work drafting.
`obsidian_mcp`	MCP plus 7 Claude skills for Obsidian vaults. Read, search, write, and link notes from Claude / Cursor / ChatGPT. Filesystem-direct, local-first, round-trip safe.
`semantic-grep`	Local semantic code search. CLI and MCP server, all on your machine.

🤖 Agent systems & runtimes

_{Local-first when possible; verifiable when not.}

Repo	What it is	Lang
`Vannevar`	Open-source agentic harness with citation-grade memory. Every fact carries a source URI, a temporal validity window, and an append-only provenance ledger. MCP-native, multi-frontend, fully self-hostable.
`agent-memory`	Verifiable memory for LLM agents. Every recalled claim is HMAC-signed back to its originating trajectory span.
`computer_use_agent`	Open-source local-VLM browser agent. AT-tree-first routing with VLM fallback, refusals enforced in code, honest benchmarks including the failure atlas.
`whisper_agent`	Hands-free local voice agent: faster-whisper STT, local LLM with tool use, TTS. Runs entirely on your machine.
`agent-tracer-2`	OpenTelemetry-native, local-first observability for AI agents. DuckDB on disk, Next.js viewer on localhost, no SaaS. Adapters for Anthropic, OpenAI, LangGraph, AutoGen, CrewAI.
`local-deep-research`	Self-hosted deep-research agent: multi-step query planning, source synthesis, report generation. Ollama / llama.cpp / vLLM friendly, with SearXNG, FAISS, and BM25.

🔭 Interpretability

_{Make model internals visible, on a laptop.}

Repo	What it is	Lang
`see-the-ai-think`	Watch an LLM think. Visualizes sparse-autoencoder features firing live across every token. Runs on a laptop, no GPU required.
`llm-fossils`	Reproducible catalog of LLM behaviors that vanished as models scaled.

_{Plus web inspectors: prompt-x-ray, tokenviewer, policy-microscope, catch-the-ai-lying.}

🧪 Benchmarks, audits & red-teaming

_{Reproducible by default. Probe for contamination, leakage, and reward hacks before declaring a number.}

Repo	What it is	Lang
`agent_eval`	Open-source benchmark for Claude Code skill bundles. Pass@k plus cost plus reliability, content-addressed leaderboard across Anthropic / OpenAI / Google.
`bench_audit`	Probes for agent benchmarks: contamination, gold-answer leaks, harness-injection vulnerabilities, reward hacking. CIs on every result.
`benchprobe`	Audits AI-agent benchmarks for the eight exploit families catalogued by Berkeley / RDI.
`agent-backtest-lab`	Statistical-rigor audit harness for LLM trading-agent frameworks, with a leakage firewall.
`ai-red-team-in-a-box`	Red-team toolkit for probing LLM systems.
`rag-bench`	Small, reproducible benchmark for RAG pipelines.
`agent-arena`	Arena-style framework for head-to-head agent comparison.
`paper-replay`	Replay and reproduce paper experiments with locked seeds, environments, and artifacts.

🛠️ Developer tools & skills

_{Quality layers, lockfiles, and ergonomics for the agent stack.}

Repo	What it is	Lang
`promptlock`	Production prompt workflow: semantic diff, eval-on-PR, lockfile, drift detection, and rollback for plain-markdown prompts in your repo.
`rigging`	Typed, trust-bearing, schema-mediated coupling layer that composes heterogeneous components.
`skill-forge-2`	Quality layer for Claude Code Skills: lint, test, and bench before you ship.
`browser-skills`	15 reusable, agent-agnostic browser recipes plus an MCP server. Cookie banners, infinite scroll, calendar widgets, all solved once.
`diagram-skills`	Generate validated diagrams across Mermaid, PlantUML, Graphviz, D2, and Excalidraw. MCP server, CLI, and Claude Code skills.
`capture-engine`	Capture any web page as vector PDF, standalone HTML, or high-DPR raster. Local-only, MV3.
`paper_pod`	Local-first audio overviews for academic papers. Take an arXiv URL, PDF, or BibTeX in, get an 8 to 15 minute two-host podcast out.
`paper2repro`	Paper to reproducible experiment scaffold.
`test_forge`	Test-generation toolkit for Python research code.

📚 Curated knowledge

_{What we had to learn the hard way, written down for the next person.}

📓 Atlases & annotated notebooks

Repo	What it is
`awesome-llm-circuits-atlas`	Interactive atlas of discovered circuits and SAE features in large language models, with Colab reproductions on open-weights models.
`awesome-reasoning-models-theory`	Theory-first map of why reasoning models (o1/o3, DeepSeek-R1, Claude-thinking, Qwen-QwQ) actually work. 8 chapters, 60+ annotated papers, 13 models compared, 5 reproduction notebooks, live benchmarks.
`retrieval-from-scratch`	Modern Information Retrieval from scratch in PyTorch. BM25, dense bi-encoders, ColBERT late interaction, cross-encoder reranking, and RAG, in annotated notebooks that run on a single GPU.

🗺️ Maps, lists & roadmaps

Repo	What it is
`awesome-why-llms-work`	Falsifiable-hypothesis atlas of why LLMs work. Five competing research programmes, 41 tracked claims with epistemic status (🟢🟡🔴⚪) and named falsifiers.
`awesome-llm-reasoning-foundations`	Curated, rigorously-verified map of the theoretical foundations of LLM reasoning: transformer expressivity, chain-of-thought error bounds, circuit complexity, logical characterizations, learnability.
`llm-impossibility-results`	Verified, assumption-explicit catalog of published impossibility and lower-bound results for LLMs and AI agents: circuit-complexity ceilings, hallucination bounds, watermarking impossibility, alignment.
`awesome-llm-theory`	Companion list: theory papers for LLM behavior, expressiveness, and learnability.
`build-your-own-ai`	Master modern AI by building it from scratch: curated index of the best build-it-yourself guides for tokenizers, attention, training, RAG, agents, and evals.
`awesome-research-agents`	Opinionated, curated list of agents, skills, MCP servers, and tools ML researchers actually use.
`ai-engineer-roadmap`	Interactive end-to-end roadmap for AI engineers. 12 stages, 122 nodes, 276 link-verified resources from math prerequisites to the research frontier.
`harness-engineer-roadmap`	Interactive roadmap for harness engineering: the agent loop, tool layers, context engineering, memory, retrieval, eval.
`awesome-llm-trading-agents`	Curated, verified map of the LLM-trading-agent ecosystem: frameworks, papers, and tools.
`OpenProblems`	A credit-rating-agency-style platform for open problems in LLM and AI research.
`llm-interview-prep`	Interview-prep notebook for LLM and ML-systems roles.

🏭 Deployment

Translational work. Coverage proofs and scheduling guarantees in production, against real workloads.

Stellaris AI. Conformal-coverage pipelines and multi-tenant scheduling for native-safe foundation models, in regulated deployments.
Brain Investing. HKU FinTech spin-out. Conformal-bound risk management running against live P&L. The lab's coverage work, in a real trading book.

🏅 Service & Recognition

Peer review at NeurIPS, EMNLP, ACM Multimedia (Main & Dataset Tracks), UAI.
Mentoring LLM-infrastructure engineers at Stellaris AI on conformal coverage, agent evaluation, and multi-tenant scheduling.
Ten consecutive years of competitive funding (2018–2025): TSSSU, HKSTP Incu-Tech, Cyberport (×2 intakes), HKU iAXON Deep Tech.

📬 Availability

Postdoc, Fall 2026

Open to positions where theory and deployment share a research agenda.

Areas. Trustworthy & compliance-grade AI · Multi-agent systems & mechanism design · LLM theory (descriptive complexity, in-context reasoning) · Serving systems for inference.

Reach me at bettyguo@connect.hku.hk

_{Dongxin (Betty) Guo · The University of Hong Kong · Department of Computer Science

homepage ·
scholar ·
orcid ·
openreview ·
linkedin

Last updated May 2026}