The Singular agentic framework for the JVM. A provider-agnostic Model interface, a streamable steerable session SDK, sandboxed code execution, structured output, durability primitives, and end-to-end tracing — for tool-using agents that run reliably in production.
Simple, explicit, no magic. No annotation-driven DI, no reflective surprises.
Published to Maven Central under the ai.singlr namespace. MIT licensed.
- Java 25+
- Maven 3.9+
Pick what you need — each jar is published independently:
| Artifact | What it gives you | External deps |
|---|---|---|
helios-core |
Model interface + value types, Tool / CommandGrant, OutputSchema + Provenanced<T>, TokenCounter, FaultTolerance, TraceListener / SpanListener, durability primitives (RunStore, ToolCallJournal), SecretRegistry + Redactor, CostEstimate / CostCalculator |
None |
helios-session |
v2 SDK — long-lived AgentSession running an agent loop on a virtual thread. SessionPresets, hooks, declarative Permission, file tools, MemoryBackend, ExecutionProvider, ContextCompactor |
Jackson 3.x |
helios-runtime |
Helidon HTTP/SSE surface for helios-session — POST /sessions, SSE /events, long-poll /result |
Helidon SE, Jackson 3.x |
helios-gemini |
Google Gemini provider (Interactions API) | Jackson 3.x |
helios-anthropic |
Anthropic Claude provider (Messages API) | Jackson 3.x |
helios-openai |
OpenAI GPT provider (Responses API) | Jackson 3.x |
helios-repl |
Sandboxed JShell substrate (JvmSandbox, ReplSession, CodeExecutionTool) + CodeActPreset for session-level RLM/CodeAct shapes |
Jackson 3.x |
helios-onnx |
Local embeddings via ONNX Runtime | ONNX Runtime, DJL Tokenizers |
helios-persistence |
PostgreSQL-backed PromptRegistry, TraceStore, and durability (PgRunStore + PgToolCallJournal) |
Helidon DbClient |
Most agentic apps want helios-session + one provider. Drop down to helios-core's Model directly for one-shot calls; expose a session over HTTP with helios-runtime.
Helios is used in production but has documented limitations that you should understand before adopting it. This section is intentionally explicit so you can decide whether the current state fits your deployment risk profile.
- Fail-secure defaults.
RuntimeServer.Builderbinds127.0.0.1(loopback) by default — external traffic is explicit opt-in viawithHost("0.0.0.0"). The HTTP routes are unauthenticated and create real model-spending sessions, so deployers that expose externally should sit behind authenticated fronting infrastructure. - Cooperative cancellation, no
Errorswallowing.RetryPolicyand the session loop catchException, notThrowable— OOM / StackOverflow / LinkageError escape cleanly so the host JVM dies rather than retrying a corrupted process.Tool.executepreserves the calling thread's interrupt status when the executor's exception chain carriesInterruptedException. - Sandboxed subprocess execution with dedicated RPC channel and descendant reaping. The
JvmSandbox↔ host RPC runs on a per-session Unix domain socket bound in a private temp directory (mode 0700); the host accepts exactly one connection then closes the listener, so no other process — including a JShell snippet inside the subprocess — can forge frames by writing to subprocess stdout. (Earlier versions used a\0RPC:-prefixed channel multiplexed on stdout, which a snippet could forge vianew PrintStream(new FileOutputStream(FileDescriptor.out)). C1 in the original review.) Subprocess stdout stays for captured output only and is never parsed as RPC.JvmSandbox.close()and its JVM-shutdown-hook path snapshot descendants before killing the parent, then forcibly destroy the snapshot — snippets that callRuntime.exec(...)cannot orphan grandchildren past sandbox lifetime. Uninterruptible JShell snippets are escalated through a documented ladder (Thread.interrupt→ 1s join →jshell.stop()→ 1s join). - Path-traversal jails on every filesystem boundary.
WorkspaceRootbacking the session'sRead/Grep/Globtools (lexical../absolute refusal +toRealPathsymlink check),OnnxModelDownloader(refuses absolute paths and traversal in HuggingFace-supplied filenames), andCommandGrant(per-call temp working directory, argv pre-scan refusing registered secrets). - SQL identifier validation on the persistence schema name.
PgConfigrejects schema names that don't match Postgres' unquoted-identifier shape, so configuration-driven schema names cannot inject SQL via the qualifier substitution. - Secret redaction at the documented boundaries.
CommandGrantflows model-visible output through a sharedSecretRegistry-derivedRedactor; the workspaceReadandGreptools accept an optionalRedactoroverload that does the same for any file content they return to the model. The REPLSandboxBindingsListener(operator telemetry of sandbox working memory) also redacts against the registry.ModelConfig.toString()redacts the API key and header values so accidentallog.info("config={}", cfg)callsites don't leak credentials. - No
Errorpropagation from sandbox bindings collection.JvmSandboxBootstrap.collectBindingscatchesThrowableper binding so a malicioustoString()that throwsStackOverflowError/OutOfMemoryErroryields an<error: …>stub instead of escaping into the virtual thread's uncaught handler. - Bounded HTTP/SSE backpressure.
AgentSession's event publisher usesSubmissionPublisher.offer(...)with bounded timeouts (1s for routine events, 30s for control events likeLoopEnded/QuestionAsked/Error). A slow or paused SSE client cannot pin the agent loop. Routine events may be dropped on a sustained slow consumer with a FINE log; control events use the longer timeout to maximise delivery. - Test coverage. JaCoCo gates at 95% instruction / 90% branch on
helios-core,helios-session,helios-runtime. Tests across the project: ~3,200 unit tests. Provider modules (helios-gemini,helios-anthropic,helios-openai) are gated by environment variables for live-API integration tests; CI runs unit tests without those keys.
These are documented intentionally — they're real and you should plan around them. We are working through them in priority order; this README will move them from "limitation" to "hardened" as each lands.
- Persistence layer stores trace/journal payloads verbatim — by design.
helios_tool_calls.{args, output, error},helios_traces.{input_text, output_text, attributes}, andhelios_spans.attributescapture exactly what the loop and tools produced. That faithful capture is what makes the audit trail useful for evals, debugging, and post-hoc analysis — forcing redaction would replace the actual signal with placeholders. The deployer's responsibility: redact at the source if needed.CommandGrantalready redacts model-visible output throughSecretRegistrycontractually;ReadTool/GrepToolandJShellExecutionProviderredact through any registry you pass them. Custom tools that handle registered secrets are the deployer's call. Deployers who want defense-in-depth at the persistence boundary without wrapping the journal themselves can opt in viaPgConfig.Builder.withRedactor(registry.redactor()). - No backwards compatibility guarantee. Helios is pre-1.0 in spirit — the public API may change between minor versions. Tag a known good version and pin it.
FinishReason.LENGTHis terminal. When the model truncates output at itsmax_output_tokenscap, the loop terminates withResultMessage.ErrorDuringExecution(kind="max-tokens")rather than re-issuing. Deployers that previously relied on the loop continuing past LENGTH should adjustmax_output_tokensinModelConfig. This is a deliberate change from earlier behaviour — silent re-issue burnedmaxTurnsof budget without making progress.- Live integration coverage is environment-gated. Provider integration tests run in CI under deployer-supplied API keys; you should add equivalent live tests for the workloads that matter to you before relying on the SDK in production-critical paths.
If any of these is a deal-breaker for your use case, open an issue — prioritisation tracks demand.
<dependency>
<groupId>ai.singlr</groupId>
<artifactId>helios-session</artifactId>
<version>${helios.version}</version>
</dependency>
<dependency>
<groupId>ai.singlr</groupId>
<artifactId>helios-anthropic</artifactId> <!-- or -gemini, -openai -->
<version>${helios.version}</version>
</dependency>JPMS:
requires ai.singlr.session;
requires ai.singlr.anthropic;Three ways in, depending on what you're building.
Direct Model API for the "send messages, get a response" shape. No session, no loop, no tools — just the model.
try (var model = new AnthropicProvider().create(
AnthropicModelId.CLAUDE_SONNET_4_6.id(),
ModelConfig.newBuilder().withApiKey(System.getenv("ANTHROPIC_API_KEY")).build())) {
var response = model.chat(List.of(
Message.system("You are a helpful assistant."),
Message.user("What is the capital of France?")));
System.out.println(response.content());
}All providers implement the same Model interface — swap providers without touching the rest of your code.
AgentSession is a long-lived object that runs an agent loop on a virtual thread. send messages, subscribe to events, interrupt mid-turn, runBlocking for synchronous results. This is the shape that shows up in production: multi-turn tool use, mid-run user steering, permission gates, observability.
try (var session = AgentSession.create(
SessionPresets.workspace(model, Path.of("/path/to/repo")).build())) {
var terminal = session.runBlocking(UserMessage.text(
"Summarise the public API of the session module."));
System.out.println(((ResultMessage.Success) terminal).result());
}See Sessions below.
helios-runtime exposes a session over REST + SSE so non-JVM clients can drive it. POST /sessions, POST /sessions/{id}/messages, SSE GET /sessions/{id}/events, long-poll GET /sessions/{id}/result?timeout=<s>, DELETE /sessions/{id}. See runtime/ for routes.
AgentSession.create(SessionOptions) returns a session that drives an agent loop on a virtual thread the first time you send. Key surfaces:
send(UserMessage)— queue a message (steering queue, drained per-iteration).events()—Flow.Publisher<QueryEvent>for streaming subscribers (15 event subtypes: text, thinking, tool use, hook fired, context warning/edited, etc.).result()—CompletableFuture<ResultMessage>for blocking await.runBlocking(msg)andrunBlocking(msg, OutputSchema)— synchronous convenience for one-shot use.interrupt(reason)— queue a synthetic mid-run user message.answer(qid, response)— resolve a pendingAskUserQuestion.close()—AutoCloseable; cancels the loop, settles the result future, drains the publisher.
SessionPresets ships three curated SessionOptions.Builder factories:
minimal(model)— model only, defaults for everything else.readOnly(model, root)—Read/Glob/Grep/LSrooted at the workspace, gated byPermission.planMode().workspace(model, root)— same read tools plusMemoryWriteplusPermission.defaultInWorkspace()(reads + memory allowed, writes / edit / execute asked).
Stream events for a live UI:
session.events().subscribe(new Flow.Subscriber<QueryEvent>() {
public void onSubscribe(Flow.Subscription s) { s.request(Long.MAX_VALUE); }
public void onNext(QueryEvent event) {
if (event instanceof QueryEvent.AssistantText t) System.out.print(t.text());
}
public void onError(Throwable t) {}
public void onComplete() {}
});
session.send(UserMessage.text("Refactor the loop module's package-info comments."));Hooks, declarative permissions, cost tracking, budget caps, session limits, context compaction, the HTTP + SSE surface — see session/README.md for the full quickstart.
The loop terminates when the model's FinishReason indicates the session is done — STOP (natural completion), CONTENT_FILTER (refusal), ERROR (provider error), or LENGTH (response truncated at max_output_tokens). LENGTH produces ResultMessage.ErrorDuringExecution(kind="max-tokens") rather than re-issuing — deployers that need higher per-turn ceilings should raise ModelConfig.maxOutputTokens instead of relying on the loop to retry. See Known limitations for the rationale (the previous behaviour silently burned maxTurns of budget without making progress).
AgentSession.events() is a SubmissionPublisher<QueryEvent> with a 256-item per-subscriber buffer. The agent loop emits via offer(...) with a bounded timeout, not blocking submit: routine events (text, tool use, hook fired) wait up to 1 s before dropping on a slow consumer; control events (LoopEnded, QuestionAsked, Error) wait up to 30 s. A subscriber that pauses indefinitely will see a gap in its event stream, but cannot pin the agent loop. Drops are logged (FINE for routine, WARNING for control).
The agent loop estimates context fill through a pluggable TokenCounter (default char-based, ~4 chars/token) and reacts at two watermarks:
- 0.85 of
SessionLimits.maxContextTokens()→ emitsQueryEvent.ContextWarning(usagePct)once. - 0.95 → invokes the configured
ContextCompactor, swaps the returned history in, and emitsQueryEvent.ContextEdited(removedBlocks, tokensBefore, tokensAfter).
The default DropMiddleToolResultsCompactor preserves the system prompt + opening turn and the most recent trajectory, summarising the middle via one model call. It walks slice boundaries to keep tool_call/tool_result pairs together (providers reject histories where these split). The summary call runs on a virtual thread under a configurable timeout (default 60s) and reports its Usage so spend gates through SessionLimits.maxBudgetMicroUsd.
Override head/tail policy via the Builder, swap the whole compactor through SessionOptions.Builder.withContextCompactor(...), opt out with ContextCompactor.disabled(), or do ad-hoc rewrites by returning HookOutcome.mutate(Map.of("history", rewritten)) from a PreModelTurnHook. See session/README.md for the full surface.
Model is AutoCloseable and holds long-lived resources (HTTP connection pool, file descriptors). Build it once at app startup, share across many sessions, close once at app shutdown. The component that constructs a Model owns its lifecycle — a session does NOT close its Model because closing one model would break sibling sessions sharing it.
try (var model = new AnthropicProvider().create(modelId, config)) {
for (var request : requests) {
try (var session = AgentSession.create(SessionPresets.workspace(model, repo).build())) {
session.runBlocking(UserMessage.text(request));
}
}
} // HttpClient and connection pool released hereAgentSession is AutoCloseable and owns its loop, per-session publisher executor, and any pending tool/question state. Always use try-with-resources; close() cancels the loop, settles the result future, and waits up to 5s for the publisher executor to drain.
ReplSession is AutoCloseable and owns its sandbox subprocess. JvmSandbox also installs a JVM shutdown hook so a leaked sandbox is force-killed on host JVM exit. Both the explicit close and the shutdown hook snapshot the subprocess's descendants and forcibly destroy them before the parent — snippets that call Runtime.exec(...) cannot leave orphaned grandchildren past the sandbox lifetime.
var weatherTool = Tool.newBuilder()
.withName("get_weather")
.withDescription("Current weather for a city")
.withParameter(ToolParameter.newBuilder()
.withName("city").withType(ParameterType.STRING).withRequired(true).build())
.withExecutor((args, ctx) -> ToolResult.success("72°F in " + args.get("city")))
.build();Mark read-only tools as idempotent at build time — Tool.newBuilder().withIdempotent(true). The session loop dispatches non-idempotent tools through a no-retry fault-tolerance envelope so side-effecting calls never replay.
CommandGrant produces a Tool that lets the model invoke a single CLI binary under tight controls. Secrets registered via withEnv(...) are auto-redacted from any model-visible output:
var registry = new SecretRegistry();
var gh = CommandGrant.builder("gh")
.withSecretRegistry(registry)
.withEnv("GH_TOKEN", System.getenv("GH_TOKEN")) // value never reaches the model
.withTimeout(Duration.ofSeconds(30))
.withMaxOutputBytes(50_000)
.withArgValidator(args ->
args.isEmpty() || !"auth".equals(args.get(0))
? Optional.empty()
: Optional.of("'gh auth' is not allowed via this grant"))
.build();Hardening is on by default: binary path pinned at build, argv-only (no shell), ProcessBuilder env cleared then injected so the JVM's environment never leaks into the child, argv pre-scan refuses any registered secret value (forces env-only secret transport), per-call temp working directory, output capped, descendants killed on timeout, stderr hidden from the model unless withStderrToModel(true). The same SecretRegistry can be shared across grants and any tool that produces model-visible output — cross-tool redaction is automatic.
The session's Read / Grep / Glob tools accept any WorkspaceRoot, not just the agent's working directory — point them at a curated reference tree to give the model native lexical search over operator-supplied content (the "no vector DB needed for a bounded corpus" pattern). Read and Grep have a 3-arg / 2-arg overload that takes a Redactor; share one SecretRegistry with your CommandGrants and a token written to a file by gh auth login stays scrubbed when the agent later reads it back.
var corpus = WorkspaceRoot.of(Path.of("/var/kb/support"));
var tracker = InMemoryFileTracker.create();
var redactor = registry.redactor();
var bindings = List.of(
ReadTool.binding(corpus, tracker, redactor), // text-body output redacted
GrepTool.binding(corpus, redactor), // match content redacted; path:line: prefix left alone
GlobTool.binding(corpus)); // paths only — no redactorPath-jail (.. / absolute refusal + toRealPath symlink check) is built into WorkspaceRoot; per-file size caps, per-output byte caps, hidden-directory pruning, and binary skip are built into the tools. Permission.planMode() or a curated Permission policy enforces read-only at the session level — there is no Write tool to disable.
OutputSchema.of(MyRecord.class) generates a JSON Schema from a Java record. The model returns a typed value, validated against the schema; shape mismatches surface as StructuredOutputParseException with a per-field diff.
record Sentiment(String label, double confidence) {}
var response = model.chat(messages, OutputSchema.of(Sentiment.class));
Sentiment s = response.parsed();For provenance-tagged output (Basis-style — every field carries source citations + confidence), wrap with OutputSchema.provenancedOf(MyOutput.class):
var schema = OutputSchema.provenancedOf(MappingProposal.class);
var response = model.chat(messages, schema);
Provenanced<MappingProposal> result = response.parsed();
// result.output() = the typed output; result.provenance() = per-field sources + confidenceProvenanceValidator.DEFAULT rejects MEDIUM/HIGH confidence entries that have no sources — the calibration mechanism that prevents the model from rubber-stamping HIGH on every field. Custom validators via OutputSchema.provenancedOf(MyOutput.class, validator).
Through session.runBlocking(message, schema), the loop intercepts StructuredOutputParseException, injects a corrective USER turn carrying the diff, and re-iterates — same self-correction shape as the v1 RLM submit() had.
The session's events() publisher is the primary streaming surface — see Sessions. For direct model use without a session:
try (var events = model.chatStream(messages, tools)) {
while (events.hasNext()) {
switch (events.next()) {
case StreamEvent.TextDelta d -> System.out.print(d.text());
case StreamEvent.ToolCallComplete tc -> System.out.println("Called: " + tc.toolCall().name());
case StreamEvent.Done d -> System.out.println("\n" + d.response().content());
default -> {}
}
}
}The iterator is Closeable — use try-with-resources to release the underlying connection promptly. For the session loop's Flow.Publisher<ModelChunk> overload (with cancellation), see Model.chatStream(messages, tools, cancellation).
Composable retry, circuit breaker, and timeout — zero dependencies:
var ft = FaultTolerance.newBuilder()
.withRetry(RetryPolicy.newBuilder()
.withMaxAttempts(3)
.withBackoff(Backoff.exponential(Duration.ofMillis(500), 2.0))
.build())
.withCircuitBreaker(CircuitBreaker.newBuilder()
.withFailureThreshold(5)
.withHalfOpenAfter(Duration.ofSeconds(30))
.build())
.withOperationTimeout(Duration.ofMinutes(5))
.build();
var result = ft.execute(() -> model.chat(messages)); // checked exceptions: see JavadocFaultTolerance.withoutRetry() returns a sibling envelope with retry stripped (circuit breaker + timeout retained) — the session loop's tool dispatch uses this for non-idempotent tools so side-effecting calls never replay.
The session's events() publisher is the primary observability surface. Subscribe a Flow.Subscriber<QueryEvent> (or a per-phase OnStreamEventHook for in-loop interception) and pick the events you care about — see the 15-subtype sealed hierarchy in ai.singlr.session.QueryEvent (AssistantText, ToolUse, ToolResult, ContextWarning, ContextEdited, HookFired, TurnEnded, LoopEnded, etc.).
session.events().subscribe(new Flow.Subscriber<QueryEvent>() {
public void onSubscribe(Flow.Subscription s) { s.request(Long.MAX_VALUE); }
public void onNext(QueryEvent ev) {
switch (ev) {
case QueryEvent.ToolUse t -> log.info("tool: {} args={}", t.call().name(), t.call().arguments());
case QueryEvent.ContextWarning w -> metrics.recordContextFill(w.usagePct());
case QueryEvent.LoopEnded e -> auditSink.flush(e.result());
default -> { /* skip */ }
}
}
public void onError(Throwable t) {}
public void onComplete() {}
});For HTTP clients, helios-runtime exposes the same events as Server-Sent Events under GET /sessions/{id}/events.
helios-core also ships generic event-sink primitives (EventSink, JsonlEventSink, CollectingEventSink in ai.singlr.core.events) plus a Trace / Span value-type hierarchy in ai.singlr.core.trace for custom collectors. These are currently inert for AgentSession runs — wire them via your own hook or stream subscriber.
The helios-repl module runs Java code in a JVM subprocess sandbox brokering access to the host via a small set of host functions. The substrate is JvmSandbox + ReplSession + CodeExecutionTool + the HostFunction registry. CodeActPreset is the v2 way to wire that substrate into an AgentSession for tool-using sessions where the agent needs general computation:
record Input(String query, List<String> documents) {}
record Output(String answer, List<String> sources, int totalCount) {}
try (var executionProvider = JShellExecutionProvider.singleSandbox(ReplConfig.newBuilder().build(), null);
var session = AgentSession.create(
SessionOptions.newBuilder()
.withModel(model)
.withExecutionProvider(executionProvider)
.apply(CodeActPreset.typed(Input.class, Output.class,
new Input("what is helios?", docs)))
.build())) {
Output answer = session.runBlocking(
UserMessage.text("Answer the query."),
OutputSchema.of(Output.class));
}CodeActPreset.withSubLm(I, O, input, subModel) adds in-sandbox predict() / submit() host functions for RLM-style fan-out — code owns loops and aggregation, the sub-LM owns judgment with fresh context per call.
Sandbox API — the host functions you register:
| Function | Purpose | Security |
|---|---|---|
| Custom host functions you register | Whatever your app needs | Argv-validated, registered before sandbox boot, frozen at startup |
predict(instructions, input) |
Call model with fresh context (via CodeActPreset.withSubLm) |
Host controls which model; per-session call budget |
submit(output) |
Return structured final result (via CodeActPreset.typed) |
Single-call enforced; validates against OutputSchema |
Credentials never enter the sandbox. Variables persist across execute_code calls; printed output is truncated when shown to the model (default 5000 chars) so long results stay in sandbox variables instead of bloating the transcript.
When a CodeActPreset.typed session runs with a record input, every top-level field is pre-bound as a typed JShell var before the model writes any code. Given record Stats(List<Integer> numbers, String operation), the model can write numbers.size() or operation.equals("sum") directly — no JSON parsing.
Every sandbox boots with a JShell preamble that adds standard imports (java.util.*, java.util.stream.*, Collectors, java.io.*, java.math.*, java.time.*), free print / println / printf (no System.out), and ten script-style helpers: sum, sumInts, mean, max, min, join, filter, map, sorted, countBy. So println(sum(numbers)) replaces System.out.println(numbers.stream().mapToInt(Integer::intValue).sum()).
Custom host functions you register get typed static JShell wrappers synthesised into the prelude — HostFunction("marketQuote", [HostParameter.required("ticker", STRING, ...)], handler) becomes callable as marketQuote("AAPL") from emitted Java code.
Local vector embeddings via ONNX Runtime. Models download from HuggingFace on first use and are cached locally.
try (var model = EmbeddingProvider.resolve(
OnnxModelId.NOMIC_EMBED_V1_5.id(), EmbeddingConfig.defaults())) {
float[] vector = model.embed("A man is eating food.").getOrThrow();
}Supported: NOMIC_EMBED_V1_5 (768-dim encoder, 8192 tokens), EMBEDDING_GEMMA_300M (768-dim decoder, 2048 tokens), HARRIER_OSS_V1_270M (640-dim multilingual decoder, 32768 tokens), HARRIER_OSS_V1_0_6B (1024-dim multilingual decoder, 32768 tokens).
Each model ships with sensible default query/document prefixes; pass a custom prefix per call when a different task shape needs different instructions:
model.embedQuery(query, "Instruct: Given a brief professional summary, find the matching profile\nQuery: ");
model.embedDocument(doc, null); // null = use the spec defaultcore.runtime ships the durability primitives — RunStore, ToolCallJournal, UnsafeResumePolicy, Durability — that crash-safe execution composes on. Production-grade impls in helios-persistence (PgRunStore + PgToolCallJournal via Helidon DbClient); in-process impls (InMemoryRunStore, InMemoryToolCallJournal) for tests.
Operations:
durability.runStore().purgeOlderThan(Duration.ofDays(30)); // cascade-deletes journal entries
DurableResumeScanner.builder(durability)
.register(...)
.build()
.scan(); // sweep stale runs and resume; wire into io.helidon.schedulingSession integration is on the v2 roadmap. The primitives are stable and the schema is shaped so v2 distributed-lease columns (worker_id, lease_until) can be added additively. The first durable AgentSession example will land alongside the wiring.
PostgreSQL-backed PromptRegistry, TraceStore, and durability impls. All share a PgConfig carrying the DbClient, schema name, and optional agent ID.
var pgConfig = PgConfig.newBuilder()
.withDbClient(dbClient)
.withSchema("helios")
.build();
var traceStore = new PgTraceStore(pgConfig);
var promptRegistry = new PgPromptRegistry(pgConfig);
var durability = PgDurability.of(pgConfig); // RunStore + ToolCallJournalSchema lives on the classpath at ai/singlr/persistence/schema.sql — run it against your database to create the helios_* tables. Optional custom schema prefix is applied to all generated SQL; the schema name is validated against Postgres' unquoted-identifier shape ([A-Za-z_][A-Za-z0-9_]{0,62}) at PgConfig construction.
Trace fidelity by design. Trace and journal payloads (helios_tool_calls.{args, output, error}, helios_traces.{input_text, output_text, attributes}, helios_spans.attributes) are persisted verbatim. That's what makes the audit trail useful for evals and debugging — placeholders would replace the actual signal. Redaction is the deployer's call: tool sources like CommandGrant, ReadTool / GrepTool (when wired with a Redactor), and JShellExecutionProvider already redact through any SecretRegistry you pass them, so traces capture the already-redacted output. For custom tools that handle registered secrets, redact at the tool boundary before returning. Deployers who want defense-in-depth at the persistence boundary can opt in via PgConfig.Builder.withRedactor(registry.redactor()) — every trace/journal text field and JSON-serialised attribute map runs through the supplied Redactor before reaching the DB. Default unset = verbatim (current behaviour).
Helios is designed for production but has known limitations — see Production Readiness above for the full hardening surface and the deferred items. This section documents the security primitives library users build on.
core.common.SecretRegistry is a thread-safe registry of secret values; Redactor (built via registry.redactor()) is an immutable Aho-Corasick byte-level scrubber that replaces every contiguous occurrence of a registered secret with <redacted:NAME>. Operates on raw bytes BEFORE UTF-8 decode so encoding mangling cannot bypass it. Validation: secrets must be ≥8 chars and pure ASCII. Overlap policy: leftmost-longest.
var registry = new SecretRegistry();
registry.register("STRIPE_KEY", System.getenv("STRIPE_KEY"));
registry.register("GH_TOKEN", System.getenv("GH_TOKEN"));Share one registry across every component that produces model-visible or operator-visible output:
CommandGrant.builder("gh").withSecretRegistry(registry)— redacts stdout / stderr; refuses argv carrying a registered secret.ReadTool.binding(workspace, tracker, registry.redactor())andGrepTool.binding(workspace, registry.redactor())— redact text-body output and match content respectively. A token aCommandGrant("gh")writes to a file stays redacted when the agent later reads it viaRead.JShellExecutionProvider.Builder.withSecretRegistry(registry)— redacts subprocess stdout / stderr, and wraps the deployer'sSandboxBindingsListenerso operator telemetry of working memory also redacts registered secrets.ModelConfig.toString()— automatically elides theapiKeyand every header value regardless of registry contents; safe to log aModelConfigdirectly.
The persistence layer does not route through the registry today — see Known limitations.
Defense in depth, not a single boundary:
- OS-level sandbox (deployer's responsibility) — Incus, Docker, gVisor, etc. This is the authoritative escape boundary.
- JVM subprocess sandbox (
JvmSandbox) — the model's code runs in a separate JVM with cleared environment, system-properties stripped from parent JVM args, classpath/modulepath inherited but agents (-javaagent,-Dauth.token=…) filtered, descendants reaped on close / shutdown hook, single-execute serialised viaSemaphore(1). - JShell prelude controls — typed wrappers around
HostFunctions, reserved-name skip on synthesiser, frozenHostFunctionRegistryafter sandbox boot.
The sandbox subprocess RPC runs on a per-session Unix domain socket bound in a private temp directory (mode 0700) — the host accepts exactly one connection from the subprocess on startup and then closes the listener and deletes the socket file. A JShell snippet that obtains a raw PrintStream to FileDescriptor.out can write whatever it wants to subprocess stdout; the host reads stdout into the execution result's captured-output buffer but never parses it as RPC, so forged frames are inert text. Layer 1 (OS-level sandboxing) remains the authoritative escape boundary against everything outside the documented Java API surface (native code via FFM with --enable-native-access, reflective access via --add-opens, etc.).
Every filesystem boundary in the framework refuses traversal. Lexical normalise + startsWith(root) first (to reject .. and absolute paths without dereferencing), then toRealPath() to refuse symlink escapes:
WorkspaceRoot(backing the session'sRead/Grep/Globtools at the workspace or any curated-corpus root)CommandGrant(per-call temp cwd unless explicitwithCwd)OnnxModelDownloader(HF-supplied file paths against the local model cache)PgConfig(Postgres schema name validated against unquoted-identifier shape)
RuntimeServer.Builder binds 127.0.0.1 by default (loopback only). The routes are unauthenticated; expose externally only behind authenticated fronting infrastructure:
var server = RuntimeServer.builder()
.withRegistry(SessionRegistry.inMemory())
.withOptionsFactory(sessionId -> ...)
// .withHost("0.0.0.0") // opt-in for external traffic; document your auth story first
.build();- No magic — explicit wiring, no annotation-driven DI.
- Records everywhere — immutable data, pattern matching, sealed types (
Result<T>,QueryEvent,ResultMessage,HookOutcome). - Builder pattern —
withprefix, staticnewBuilder()factory. - JPMS modules — proper encapsulation, ServiceLoader SPI for providers.
- Production from day 1 — fault tolerance, tracing, JaCoCo coverage gates on
helios-coreandhelios-sessionat 95% instruction / 90% branch. - Currency is integer micro-USD —
long microUsd(Stripe-style fixed-precision).BigDecimalonly at the display boundary.
mvn package
mvn spotless:apply # auto-format (Google Java Format, 2-space indent)