Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion docs/agents/evals.md
Original file line number Diff line number Diff line change
Expand Up @@ -265,7 +265,11 @@ Update this section whenever active eval membership or scoring changes.
bodies into helpers. Its high multi-line-lambda criterion weight is intentional focused
behavior-delta coverage, not ordinary broad lift evidence.
- Hard-stop scan audits: regression explicit workflow-use only.
- Reference suite: 6 scenarios, 560 total checklist points. Deleted reference number 12 and
- Reference suite: 20 scenarios, 1960 total checklist points. Reference numbers `29` through `42`
cover the open issue sweep for bounded duplicate lookup, findAny audits, immutable/result
collection boundaries, predicate loops, parser-preserving streams, collector rationale, formatting,
identity mappers, batched lookup phases, mapMulti extraction, tail allMatch checks, and forEach
side-effect classification. Deleted reference number 12 and
regression-moved scenarios are not counted.
- Regression suite: 19 scenarios, 1820 total checklist points.
- Hosted benchmark evidence is pending rerun for the current active suite. Do not publish exact
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Refactor duplicate-aware Java stream lookups so they inspect at most two matches while preserving
zero, one, and ambiguous-match behavior.
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
{
"context": "Reference focused cleanup: duplicate-aware lookup helpers should use bounded stream collection without accepting ambiguous matches.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Creates coherent Java 21 artifact",
"category": "safety",
"max_score": 8,
"description": "Returns a complete revised ChecklistLookup.java snippet with necessary imports and Java 21-compatible code."
},
{
"name": "Uses bounded duplicate detection",
"category": "stream_quality",
"max_score": 28,
"description": "Filters matching values, limits the stream to at most two matches, materializes only those bounded matches, and branches on zero, one, or ambiguous matches."
},
{
"name": "Rejects findFirst shortcut",
"category": "stream_quality",
"max_score": 18,
"description": "Does not replace the loop with findFirst, findAny, or an equivalent first-match shortcut that would silently accept duplicate matches."
},
{
"name": "Shares the repeated branch carefully",
"category": "maintainability",
"max_score": 16,
"description": "Extracts a small generic helper for the shared zero/one/ambiguous branch, or otherwise removes meaningful duplication without hiding the domain-specific predicate, error code, or message."
},
{
"name": "Preserves matching behavior",
"category": "safety",
"max_score": 12,
"description": "Keeps Objects.equals-style null-safe matching for checklist names and item text, preserves input encounter order for the single returned match, and still returns null when no match exists."
},
{
"name": "Preserves exceptions",
"category": "safety",
"max_score": 10,
"description": "Keeps the stable TrelloException type, duplicate error codes, and duplicate messages for both helper methods."
},
{
"name": "Avoids over-engineering",
"category": "maintainability",
"max_score": 8,
"description": "Does not introduce broad lookup frameworks, caches, parallel streams, or unrelated API changes."
}
],
"metadata": {
"invocation": "natural",
"task_type": "cleanup",
"evidence_type": "focused_reference",
"issue": "https://github.com/martinfrancois/java-streams-skill/issues/53",
"reference_selection": "Focused issue #53 coverage for bounded duplicate-detection stream lookups.",
"runtime_reference_overlap_rationale": "Allowed only as reference-suite focused coverage; do not report as ordinary broad lift if runtime references later teach this same shape."
}
}
69 changes: 69 additions & 0 deletions evals-reference/29-bounded-duplicate-detection-stream/task.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Refactor duplicate-aware lookups

Refactor `ChecklistLookup.java` with a stream-based implementation. Assume Java 21.

Return the revised Java code only.

```java
import java.util.List;
import java.util.Objects;

final class ChecklistLookup {
static Card.Checklist singleChecklistByName(List<Card.Checklist> checklists, String checklistName) {
Card.Checklist match = null;
for (Card.Checklist checklist : checklists) {
if (!Objects.equals(checklist.name(), checklistName)) {
continue;
}
if (match != null) {
throw new TrelloException(
"trello_checklist_ambiguous",
"Multiple Trello checklists match the requested checklist_name.");
}
match = checklist;
}
return match;
}

static Card.ChecklistItem singleCheckItemByName(Card.Checklist checklist, String itemName) {
Card.ChecklistItem match = null;
for (Card.ChecklistItem item : checklist.items()) {
if (!Objects.equals(item.text(), itemName)) {
continue;
}
if (match != null) {
throw new TrelloException(
"trello_check_item_ambiguous",
"Multiple Trello checklist items match the requested item_name.");
}
match = item;
}
return match;
}

record Card(List<Checklist> checklists) {
record Checklist(String name, List<ChecklistItem> items) {}
record ChecklistItem(String text) {}
}

static final class TrelloException extends RuntimeException {
private final String code;

TrelloException(String code, String message) {
super(message);
this.code = code;
}

String code() {
return code;
}
}
}
```

Preserve null-safe name matching, no-match `null` behavior, encounter order for the single returned
match, and the existing exception codes and messages. The lookup only needs to distinguish zero
matches, exactly one match, and at least two matches, so do not scan or retain matches after
ambiguity is already proven. If both lookup methods need the same zero, one, or ambiguous branch,
extract that branch into a small shared helper while keeping the predicates and error contracts
domain-specific. Keep the code small.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Audit Optional-returning stream terminals and choose findAny only when encounter order is not part
of the result contract.
50 changes: 50 additions & 0 deletions evals-reference/30-prefer-findany-equivalent-matches/criteria.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
{
"context": "Reference focused cleanup: findAny should express equivalent-match lookups while findFirst remains for ordered contracts.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Creates coherent Java 17 artifact",
"category": "safety",
"max_score": 8,
"description": "Returns revised Java 17-compatible LookupTerminals code with imports, methods, helper, and record intact."
},
{
"name": "Uses findAny for equivalent matches",
"category": "stream_quality",
"max_score": 30,
"description": "Changes exact or normalized configured-name lookups to findAny because all valid matches are equivalent or expected to be unique by contract."
},
{
"name": "Preserves ordered first-match contracts",
"category": "stream_quality",
"max_score": 24,
"description": "Keeps findFirst for PATH-style search order and first output line selection where encounter order selects the result."
},
{
"name": "Explains retained findFirst calls",
"category": "maintainability",
"max_score": 12,
"description": "Adds concise comments or equivalent explanation for each retained findFirst call that identify the order contract rather than relying on current sequential behavior."
},
{
"name": "Avoids mechanical replacement",
"category": "safety",
"max_score": 14,
"description": "Does not replace every findFirst mechanically, does not use findAny for fallback or first-line behavior, and does not claim tests alone prove order irrelevance."
},
{
"name": "Keeps filters and normalization",
"category": "safety",
"max_score": 12,
"description": "Preserves case-insensitive list matching, closed-list filtering, normalization, path resolution, and output-line filtering."
}
],
"metadata": {
"invocation": "natural",
"task_type": "cleanup",
"evidence_type": "focused_reference",
"issue": "https://github.com/martinfrancois/java-streams-skill/issues/51",
"reference_selection": "Focused issue #51 coverage for findAny versus findFirst semantic audits.",
"runtime_reference_overlap_rationale": "Allowed only as reference-suite focused coverage; do not report as ordinary broad lift if runtime references later teach this same audit shape."
}
}
49 changes: 49 additions & 0 deletions evals-reference/30-prefer-findany-equivalent-matches/task.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Audit Optional stream terminals

Refactor `LookupTerminals.java` only where the terminal operation's contract is clearer. Assume Java 17.

Return the revised Java code and one brief comment beside each retained `findFirst()` explaining
why the first match is semantically required.

```java
import java.nio.file.Path;
import java.util.List;
import java.util.Locale;
import java.util.Optional;

final class LookupTerminals {
static Optional<String> detectedList(List<String> openListNames, String expectedName) {
return openListNames.stream()
.filter(name -> name.equalsIgnoreCase(expectedName))
.findFirst();
}

static Optional<BoardList> targetList(List<BoardList> lists, String configuredName) {
String expected = normalize(configuredName);
return lists.stream()
.filter(list -> !list.closed())
.filter(list -> normalize(list.name()).equals(expected))
.findFirst();
}

static Optional<Path> firstExistingPath(List<Path> searchPath, String commandName) {
return searchPath.stream()
.map(path -> path.resolve(commandName))
.filter(path -> path.toFile().exists())
.findFirst();
}

static Optional<String> firstVersionLine(String output) {
return output.lines()
.map(String::stripLeading)
.filter(line -> line.startsWith("java "))
.findFirst();
}

private static String normalize(String value) {
return value.toLowerCase(Locale.ROOT).replaceAll("\\s+", " ").strip();
}

record BoardList(String id, String name, boolean closed) {}
}
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Replace simple temporary append buffers with direct stream-owned immutable results while keeping
mutable builders where they remain clearer.
50 changes: 50 additions & 0 deletions evals-reference/31-immutable-result-append-list/criteria.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
{
"context": "Reference focused cleanup: a temporary mutable append list can become a direct immutable result when mutability is not part of the method contract.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Creates coherent Java 17 artifact",
"category": "safety",
"max_score": 8,
"description": "Returns complete Java 17-compatible ManifestUpdate code with imports, records, methods, and constructor behavior intact."
},
{
"name": "Replaces simple append buffer",
"category": "stream_quality",
"max_score": 28,
"description": "Refactors withBoard to produce filtered existing boards plus the new board directly, for example with Stream.concat and Stream.of, instead of creating a mutable append buffer."
},
{
"name": "Preserves encounter order",
"category": "safety",
"max_score": 18,
"description": "Keeps all retained existing boards in original order and appends the new board after them."
},
{
"name": "Audits result mutability",
"category": "stream_quality",
"max_score": 14,
"description": "Recognizes that the manifest constructor copies input, so the temporary list mutability is not part of the public result contract."
},
{
"name": "Keeps complex builder when clearer",
"category": "maintainability",
"max_score": 16,
"description": "Does not force withOptionalSections into a dense stream when the conditional append and optional summary row are clearer as a small builder or loop."
},
{
"name": "Avoids unrelated changes",
"category": "safety",
"max_score": 16,
"description": "Does not change sameBoardOrWorkflow semantics, archive filtering, record fields, constructor copying, or method signatures."
}
],
"metadata": {
"invocation": "natural",
"task_type": "cleanup",
"evidence_type": "focused_reference",
"issue": "https://github.com/martinfrancois/java-streams-skill/issues/50",
"reference_selection": "Focused issue #50 coverage for immutable result production over temporary append lists.",
"runtime_reference_overlap_rationale": "Allowed only as reference-suite focused coverage; do not report as ordinary broad lift if runtime references later teach this same shape."
}
}
51 changes: 51 additions & 0 deletions evals-reference/31-immutable-result-append-list/task.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Remove unnecessary temporary mutability

Refactor `ManifestUpdate.java` where doing so improves readability without changing behavior.
Assume Java 17.

Return the revised Java code only.

```java
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Stream;

record ConnectedBoardManifest(List<ConnectedBoard> boards) {
ConnectedBoardManifest {
boards = List.copyOf(boards);
}

ConnectedBoardManifest withBoard(ConnectedBoard board) {
List<ConnectedBoard> updated = new ArrayList<>(boards.stream()
.filter(existing -> !sameBoardOrWorkflow(existing, board))
.toList());
updated.add(board);
return new ConnectedBoardManifest(updated);
}

ConnectedBoardManifest withOptionalSections(List<ConnectedBoard> selected, boolean includeArchived) {
List<ConnectedBoard> updated = new ArrayList<>();
for (ConnectedBoard board : selected) {
if (!board.archived() || includeArchived) {
updated.add(board);
}
}
if (includeArchived) {
updated.add(new ConnectedBoard("archive-summary", null, true));
}
return new ConnectedBoardManifest(updated);
}

private static boolean sameBoardOrWorkflow(ConnectedBoard left, ConnectedBoard right) {
return left.boardId().equals(right.boardId())
|| left.workflowPath() != null && left.workflowPath().equals(right.workflowPath());
}
}

record ConnectedBoard(String boardId, String workflowPath, boolean archived) {}
```

The manifest constructor copies its input. Preserve encounter order, filtering, duplicate handling,
and public API shape. Only refactor the simple temporary append-buffer case when the stream result
stays readable; leave the conditional builder method imperative if the current loop is clearer than a
dense stream expression.
2 changes: 2 additions & 0 deletions evals-reference/32-predicate-loop-any-match/capability.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Replace pure predicate loops with anyMatch while keeping side effects, diagnostics, and indexes out
of stream pipelines.
Loading