Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Tessl-authenticated checks:

- [ ] `bash scripts/check_publish_dry_run.sh .`
- [ ] `tessl plugin publish --dry-run --bump patch .`
- [ ] `tessl skill review --threshold 100 skills/java-optionals/SKILL.md`, if skill text or references changed
- [ ] `tessl review run --workspace martinfrancois --threshold 100 skills/java-optionals/SKILL.md`, if skill text or references changed
- [ ] Targeted main/reference `scripts/run_eval_suite.sh <main|reference> <scenario-name>`, if skill behavior or those evals changed
- [ ] Targeted regression `scripts/run_eval_suite.sh regression <scenario-name>`, if regression evals changed
- [ ] Every substantively changed eval scenario was rerun targeted and reached 100% with context, or the PR explains the Tessl blocker and remaining work
Expand All @@ -65,7 +65,7 @@ Tessl-authenticated checks:
- [ ] `scripts/classify_eval_result.py <run-json> --scenario-dir <scenario-dir>`, if a scenario was added or moved between suites
- [ ] Full/main `scripts/run_eval_suite.sh main`, if benchmark claims changed or targeted with-context results are clean

`bash scripts/check_publish_dry_run.sh .`, `tessl skill review`, and hosted Tessl evals require
`bash scripts/check_publish_dry_run.sh .`, `tessl review run`, and hosted Tessl evals require
Tessl authentication. Hosted evals also require a linked Tessl project. If you can't run one of
them, leave it unchecked and explain why in the details.

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/skill-review.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,4 @@ jobs:

- name: Review skill
if: ${{ env.TESSL_TOKEN_AVAILABLE == 'true' }}
run: tessl skill review --threshold 100 skills/java-optionals/SKILL.md
run: tessl review run --workspace martinfrancois --threshold 100 skills/java-optionals/SKILL.md
6 changes: 5 additions & 1 deletion docs/agents/evals.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,11 @@ benchmark claims, or scoring rules.
Current active suite structure:

- `evals/`: 4 scenarios, 360 checklist points, 3 natural and 1 explicit.
- `evals-reference/`: 46 scenarios, 2470 checklist points, broad candidate and diagnostic coverage.
- `evals-reference/`: 52 scenarios, 3070 checklist points, broad candidate and diagnostic coverage.
Reference numbers `51` through `56` cover the July 2026 open-issue sweep for presence-to-enum
selection, findAny/findFirst Optional terminals, domain selections with lazy fallback,
side-effecting upsert boundaries, ifPresentOrElse rendering branches, and lifecycle Optional
helper boundaries.
- `evals-regression/`: 2 scenarios, 200 checklist points, with-context safety coverage.

## Checks
Expand Down
2 changes: 1 addition & 1 deletion docs/agents/workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ release-readiness.
- Run the Tessl skill review at threshold 100 when changing runtime skill content:

```bash
tessl skill review --threshold 100 skills/java-optionals/SKILL.md
tessl review run --workspace martinfrancois --threshold 100 skills/java-optionals/SKILL.md
```

- Pull request titles and commits must use Conventional Commits. Release Please uses them to update
Expand Down
2 changes: 2 additions & 0 deletions evals-reference/51-presence-selects-enum-value/capability.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Refactor Optional presence checks that only select enum values while preserving side-effecting
branches as explicit code.
54 changes: 54 additions & 0 deletions evals-reference/51-presence-selects-enum-value/criteria.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
{
"context": "Reference cleanup: Optional presence selecting between enum values should be expressed as Optional value flow when no side effects or checked work are involved.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Creates coherent Java 17 artifact",
"category": "safety",
"max_score": 8,
"description": "Returns complete Java 17-compatible BoardSetupChoices code with imports, enum, interfaces, and methods intact."
},
{
"name": "Uses Optional map for enum selection",
"category": "optional_quality",
"max_score": 30,
"description": "Replaces the dry-run isPresent ternary with options.existingBoardId().map(...).orElse(...) or equivalent Optional value flow."
},
{
"name": "Preserves ignored present value",
"category": "optional_quality",
"max_score": 12,
"description": "Correctly ignores the contained board id when mapping any present value to BoardSetupChoice.EXISTING."
},
{
"name": "Uses appropriate eager constant fallback",
"category": "optional_quality",
"max_score": 12,
"description": "Uses orElse for the trivial BoardSetupChoice.NEW constant fallback rather than noisy orElseGet when no work is deferred."
},
{
"name": "Keeps side-effect branch imperative",
"category": "safety",
"max_score": 16,
"description": "Does not force choiceWithAudit into an Optional chain that hides auditLog.record or changes side-effect ordering."
},
{
"name": "Avoids Optional antipatterns",
"category": "optional_quality",
"max_score": 14,
"description": "Does not introduce get(), orElseThrow(), orElse(null), fake lists, or a custom helper just to select the enum."
},
{
"name": "Keeps behavior focused",
"category": "maintainability",
"max_score": 8,
"description": "Does not change enum values, method signatures, rejectNewBoardInProgress calls, or unrelated setup behavior."
}
],
"metadata": {
"invocation": "natural",
"task_type": "cleanup",
"evidence_type": "ordinary_lift",
"issue": "https://github.com/martinfrancois/java-optionals-skill/issues/74"
}
}
45 changes: 45 additions & 0 deletions evals-reference/51-presence-selects-enum-value/task.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Clean up enum selection

Create `BoardSetupChoices.java` with the refactored class. Improve value flow where appropriate.
Assume Java 17.

Return the complete revised Java code only.

```java
import java.util.Optional;

final class BoardSetupChoices {
void rejectDryRunNewBoardInProgress(LocalSetupOptions options) {
BoardSetupChoice dryRunChoice =
options.existingBoardId().isPresent() ? BoardSetupChoice.EXISTING : BoardSetupChoice.NEW;
rejectNewBoardInProgress(options, dryRunChoice);
}

BoardSetupChoice choiceWithAudit(LocalSetupOptions options, AuditLog auditLog) {
if (options.existingBoardId().isPresent()) {
auditLog.record("existing board selected");
return BoardSetupChoice.EXISTING;
}
return BoardSetupChoice.NEW;
}

private void rejectNewBoardInProgress(LocalSetupOptions options, BoardSetupChoice choice) {}

enum BoardSetupChoice {
EXISTING,
NEW
}

interface LocalSetupOptions {
Optional<String> existingBoardId();
}

interface AuditLog {
void record(String message);
}
}
```

The present board id value is intentionally ignored in the dry-run choice. Preserve side effects and
enum values. Keep `choiceWithAudit` as an explicit imperative branch; do not hide
`auditLog.record(...)` inside an Optional `map` or other transformation callback.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Audit Optional-producing stream lookups and choose findAny only when all matches are equivalent.
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{
"context": "Reference cleanup: Optional-returning stream terminals should use findAny for equivalent matches and preserve findFirst for ordered first-match contracts.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Creates coherent Java 17 artifact",
"category": "safety",
"max_score": 8,
"description": "Returns complete Java 17-compatible OptionalLookupTerminals code with imports, helper, methods, and record intact."
},
{
"name": "Uses findAny for equivalent matches",
"category": "optional_quality",
"max_score": 30,
"description": "Changes exact and normalized configured-name lookups to findAny because the Optional result does not have an encounter-order contract."
},
{
"name": "Preserves ordered findFirst contracts",
"category": "optional_quality",
"max_score": 24,
"description": "Keeps findFirst for PATH-style executable lookup and first Java version line, where encounter order selects the value."
},
{
"name": "Explains retained findFirst calls",
"category": "maintainability",
"max_score": 12,
"description": "Adds concise comments or equivalent explanation for retained findFirst calls based on semantic order, not current sequential-stream behavior."
},
{
"name": "Avoids mechanical terminal changes",
"category": "safety",
"max_score": 14,
"description": "Does not replace every findFirst blindly and does not preserve every findFirst merely because the code already works."
},
{
"name": "Preserves Optional flattening",
"category": "safety",
"max_score": 12,
"description": "Keeps Optional-returning parsing flattened correctly with Optional::stream or equivalent and does not introduce get(), null, or fake-list handling."
}
],
"metadata": {
"invocation": "natural",
"task_type": "cleanup",
"evidence_type": "ordinary_lift",
"issue": "https://github.com/martinfrancois/java-optionals-skill/issues/72"
}
}
56 changes: 56 additions & 0 deletions evals-reference/52-findany-equivalent-optional-matches/task.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Audit Optional stream lookup terminals

Refactor `OptionalLookupTerminals.java` only where the terminal operation better communicates the
contract. Assume Java 17.

Return the revised Java code and one brief comment beside each retained `findFirst()` explaining why
the first match is semantically required.

```java
import java.nio.file.Path;
import java.util.List;
import java.util.Locale;
import java.util.Optional;

final class OptionalLookupTerminals {
static Optional<String> detectedList(List<String> openListNames, String expectedName) {
return openListNames.stream()
.filter(name -> name.equalsIgnoreCase(expectedName))
.findFirst();
}

static Optional<BoardList> targetList(List<BoardList> lists, String configuredName) {
String expected = normalize(configuredName);
return lists.stream()
.filter(list -> !list.closed())
.filter(list -> normalize(list.name()).equals(expected))
.findFirst();
}

static Optional<Path> firstExecutable(List<Path> searchPath, String commandName) {
return searchPath.stream()
.map(path -> path.resolve(commandName))
.filter(path -> path.toFile().exists())
.findFirst();
}

static Optional<Integer> firstJavaMajor(String output) {
return output.lines()
.map(String::stripLeading)
.filter(line -> line.startsWith("java "))
.map(OptionalLookupTerminals::firstInteger)
.flatMap(Optional::stream)
.findFirst();
}

private static Optional<Integer> firstInteger(String value) {
return Optional.empty();
}

private static String normalize(String value) {
return value.toLowerCase(Locale.ROOT).replaceAll("\\s+", " ").strip();
}

record BoardList(String id, String name, boolean closed) {}
}
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Build domain selections directly from Optional present and absent branches while keeping expensive
fallbacks lazy and checked prompt boundaries explicit.
54 changes: 54 additions & 0 deletions evals-reference/53-domain-selection-lazy-fallback/criteria.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
{
"context": "Reference implementation cleanup: Optional selections should build domain objects in one present/absent flow and keep fallback work lazy.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Creates coherent Java 25 artifact",
"category": "safety",
"max_score": 8,
"description": "Returns complete Java 25-compatible SetupSelections code with imports, constructor, methods, interfaces, and records intact."
},
{
"name": "Builds MaxAgentsSelection in Optional flow",
"category": "optional_quality",
"max_score": 22,
"description": "Uses workflowConfig.maxAgents(workflowPath).map(...).orElseGet(...) or equivalent to construct MaxAgentsSelection with value and provenance together."
},
{
"name": "Preserves explicit override behavior",
"category": "safety",
"max_score": 12,
"description": "Keeps explicit max-agents input winning before workflow fallback and preserves preservedFromWorkflow=false for explicit or default values."
},
{
"name": "Transforms Codex defaults directly",
"category": "optional_quality",
"max_score": 18,
"description": "Replaces isEmpty plus orElseThrow ordinary value flow with codexModelDefaults().map(...).orElse(boardSetup) or equivalent."
},
{
"name": "Keeps dotenv fallback lazy",
"category": "optional_quality",
"max_score": 18,
"description": "Uses Optional.or or equivalent lazy fallback so load(dotenv) runs only when no environment value is present."
},
{
"name": "Preserves checked prompt boundary",
"category": "safety",
"max_score": 12,
"description": "Keeps promptedMaxAgents as a clear checked-IOException branch instead of forcing readLine through a generic Optional helper."
},
{
"name": "Avoids Optional workarounds",
"category": "optional_quality",
"max_score": 10,
"description": "Does not use orElse(null), fake Optional lists or streams, generic throwing helpers, or repeated isPresent plus value reads for ordinary value flow."
}
],
"metadata": {
"invocation": "natural",
"task_type": "cleanup",
"evidence_type": "ordinary_lift",
"issue": "https://github.com/martinfrancois/java-optionals-skill/issues/71"
}
}
Loading
Loading