Add Linux/macOS test matrix and portability fallbacks by jstar0 · Pull Request #37 · nano-step/eval-harness

jstar0 · 2026-06-04T16:09:28Z

Add Linux/macOS test matrix and portability fallbacks

Summary

Fixes #24.

Add a GitHub Actions workflow that runs every scripts/eval/tests/*.sh suite on ubuntu-latest and macos-latest.
Make the harness test path work on macOS Bash 3.2 by replacing mapfile usage and removing grep -P from the pre-push hook.
Keep the documented python3 stdlib path working without PyYAML by adding stdlib fallback coverage for the yq shim.
Add portable timeout and SHA-256 helpers so the test path can use python3/shasum fallbacks when GNU tools are unavailable.

Changes

Add .github/workflows/tests.yml with a Linux/macOS matrix for the shell test suite.
Add scripts/eval/lib/portable.sh for SHA-256 and timeout fallbacks.
Update manifest, stability, spawn, run, and hook paths to use portable helpers and Bash 3-compatible array loading.
Extend test coverage for stdlib YAML parsing, portable tool fallbacks, and pre-push hook skill detection.

Why

Issue #24 asks for a CI matrix that verifies the full shell test suite on Linux and macOS. Running the suite locally on macOS exposed a few portability blockers that would make that matrix fail or depend on non-base tools:

macOS /bin/bash 3.2 does not provide mapfile.
grep -P is not portable to BSD grep.
a clean Python environment may not have PyYAML even though the project documents a python3 stdlib fallback.
timeout/sha256sum availability differs across macOS environments.

The workflow and portability fixes are kept together so the new macOS job has a realistic chance of validating the existing suite instead of immediately failing on known toolchain assumptions.

Verification

for t in ./scripts/eval/tests/*.sh; do echo "== $(basename "$t")"; bash "$t"; done

PATH=/usr/bin:/bin:/usr/sbin:/sbin bash -lc 'for t in ./scripts/eval/tests/*.sh; do echo "== $(basename "$t")"; bash "$t"; done'

npm test

Risk

The largest change is the stdlib fallback path in the yq shim. I added focused coverage for the YAML shapes the harness currently relies on: nested maps, inline lists, block scalars, inline comments, list iteration, indexed objects, and default expressions. The workflow provisions jq when missing; timeout and SHA-256 portability are handled inside the harness rather than hidden by installing GNU coreutils in CI.

gemini-code-assist

Code Review

This pull request introduces portability improvements to the evaluation harness scripts, making them compatible with macOS and environments lacking GNU coreutils or PyYAML. Key changes include replacing GNU-specific commands (like timeout and mapfile) with portable shell alternatives, implementing a fallback YAML parser in Python's standard library, and introducing a portable.sh utility library. The review feedback highlights a few remaining compatibility and parsing issues: the use of the GNU-specific sort -z option in manifest.sh which will fail on macOS, and edge cases in the custom YAML parser (_yq.py) where inline comments on keys or list items without values are not correctly stripped, leading to parsing errors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-04T16:12:00Z

    skill_bundle_sha="$(cd "$skills_root" && find . -type f \( -name "*.md" -o -name "*.sh" -o -name "*.yaml" -o -name "*.json" \) -print0 \
      | sort -z \
-      | xargs -0 sha256sum 2>/dev/null \
-      | sha256sum \
+      | while IFS= read -r -d '' file; do portable_sha256_file "$file"; done \
+      | portable_sha256_stdin \
      | cut -d' ' -f1)"


The sort -z option is a GNU extension and is not supported by the default BSD sort utility on macOS. Since this PR specifically aims to support macOS without requiring GNU coreutils, using sort -z will cause the manifest generation to fail on macOS.

A highly portable alternative to sort null-terminated lines is to translate null bytes to newlines, sort them, and translate them back using tr.

Suggested change

skill_bundle_sha="$(cd "$skills_root" && find . -type f $ -name "*.md" -o -name "*.sh" -o -name "*.yaml" -o -name "*.json" $ -print0 \

| sort -z \

| xargs -0 sha256sum 2>/dev/null \

| sha256sum \

| while IFS= read -r -d '' file; do portable_sha256_file "$file"; done \

| portable_sha256_stdin \

| cut -d' ' -f1)"

skill_bundle_sha="$(cd "$skills_root" && find . -type f $ -name "*.md" -o -name "*.sh" -o -name "*.yaml" -o -name "*.json" $ -print0 \

| tr '\0' '\n' | sort | tr '\n' '\0' \

| while IFS= read -r -d '' file; do portable_sha256_file "$file"; done \

| portable_sha256_stdin \

| cut -d' ' -f1)"

gemini-code-assist · 2026-06-04T16:12:01Z

    skill_sha="$(cd "$skill_dir" && find . -type f -print0 \
      | sort -z \
-      | xargs -0 sha256sum 2>/dev/null \
-      | sha256sum \
+      | while IFS= read -r -d '' file; do portable_sha256_file "$file"; done \
+      | portable_sha256_stdin \
      | cut -d' ' -f1)"


The sort -z option is a GNU extension and is not supported by the default BSD sort utility on macOS. Use tr to portably sort null-terminated lines.

Suggested change

skill_sha="$(cd "$skill_dir" && find . -type f -print0 \

| sort -z \

| xargs -0 sha256sum 2>/dev/null \

| sha256sum \

| while IFS= read -r -d '' file; do portable_sha256_file "$file"; done \

| portable_sha256_stdin \

| cut -d' ' -f1)"

skill_sha="$(cd "$skill_dir" && find . -type f -print0 \

| tr '\0' '\n' | sort | tr '\n' '\0' \

| while IFS= read -r -d '' file; do portable_sha256_file "$file"; done \

| portable_sha256_stdin \

| cut -d' ' -f1)"

gemini-code-assist · 2026-06-04T16:12:01Z

    fixture_sha="$(cd "$EVAL_FIXTURE_DIR" && find . -type f -print0 \
      | sort -z \
-      | xargs -0 sha256sum 2>/dev/null \
-      | sha256sum \
+      | while IFS= read -r -d '' file; do portable_sha256_file "$file"; done \
+      | portable_sha256_stdin \
      | cut -d' ' -f1)"


The sort -z option is a GNU extension and is not supported by the default BSD sort utility on macOS. Use tr to portably sort null-terminated lines.

Suggested change

fixture_sha="$(cd "$EVAL_FIXTURE_DIR" && find . -type f -print0 \

| sort -z \

| xargs -0 sha256sum 2>/dev/null \

| sha256sum \

| while IFS= read -r -d '' file; do portable_sha256_file "$file"; done \

| portable_sha256_stdin \

| cut -d' ' -f1)"

fixture_sha="$(cd "$EVAL_FIXTURE_DIR" && find . -type f -print0 \

| tr '\0' '\n' | sort | tr '\n' '\0' \

| while IFS= read -r -d '' file; do portable_sha256_file "$file"; done \

| portable_sha256_stdin \

| cut -d' ' -f1)"

gemini-code-assist · 2026-06-04T16:12:01Z

+        key, raw_value = split_key_value(text)
+        if key is None:
+            break
+        key = parse_scalar(key)
+        if raw_value in ("|", ">"):
+            out[key], i = parse_block_scalar(lines, i + 1, ind, raw_value)
+            continue
+        if raw_value != "":
+            out[key] = parse_scalar(raw_value)
+            i += 1
+            continue


If a key has an inline comment but no value (e.g., unsafe_shell: # comment), raw_value will contain the comment string (e.g., "# comment"). Because raw_value != "" is true, it will parse the comment as an empty string instead of None/null.

We should strip the inline comment from raw_value before checking if it is empty.

Suggested change

key, raw_value = split_key_value(text)

if key is None:

break

key = parse_scalar(key)

if raw_value in ("|", ">"):

out[key], i = parse_block_scalar(lines, i + 1, ind, raw_value)

continue

if raw_value != "":

out[key] = parse_scalar(raw_value)

i += 1

continue

key, raw_value = split_key_value(text)

if key is None:

break

key = parse_scalar(key)

val_clean = strip_inline_comment(raw_value).strip()

if val_clean in ("|", ">"):

out[key], i = parse_block_scalar(lines, i + 1, ind, val_clean)

continue

if val_clean != "":

out[key] = parse_scalar(val_clean)

i += 1

continue

gemini-code-assist · 2026-06-04T16:12:01Z

+        if raw_value == "":
+            item = {key: None}
+            if j < len(lines) and indent_of(lines[j]) > ind:
+                item[key], i = parse_block(lines, j, indent_of(lines[j]))
+        else:
+            item = {key: parse_scalar(raw_value)}
+            if j < len(lines) and indent_of(lines[j]) > ind:
+                rest, i = parse_dict(lines, j, indent_of(lines[j]))
+                item.update(rest)


If a list item key has an inline comment (e.g., - shell: # comment), raw_value will contain the comment string. This causes the parser to incorrectly execute the else block, setting the value to "" and flattening any nested block structures (e.g., cmd: echo under shell will be merged into the parent dictionary instead of nested under shell).

We should strip the inline comment from raw_value before checking if it is empty.

Suggested change

if raw_value == "":

item = {key: None}

if j < len(lines) and indent_of(lines[j]) > ind:

item[key], i = parse_block(lines, j, indent_of(lines[j]))

else:

item = {key: parse_scalar(raw_value)}

if j < len(lines) and indent_of(lines[j]) > ind:

rest, i = parse_dict(lines, j, indent_of(lines[j]))

item.update(rest)

val_clean = strip_inline_comment(raw_value).strip()

if val_clean == "":

item = {key: None}

if j < len(lines) and indent_of(lines[j]) > ind:

item[key], i = parse_block(lines, j, indent_of(lines[j]))

else:

item = {key: parse_scalar(val_clean)}

if j < len(lines) and indent_of(lines[j]) > ind:

rest, i = parse_dict(lines, j, indent_of(lines[j]))

item.update(rest)

ci: add macOS test matrix and portability fallbacks

d81f31f

gemini-code-assist Bot reviewed Jun 4, 2026

View reviewed changes

fix: address portability review gaps

825fc99

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Linux/macOS test matrix and portability fallbacks#37

Add Linux/macOS test matrix and portability fallbacks#37
jstar0 wants to merge 2 commits into
nano-step:mainfrom
jstar0:ci/test-matrix

jstar0 commented Jun 4, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jstar0 commented Jun 4, 2026

Add Linux/macOS test matrix and portability fallbacks

Summary

Changes

Why

Verification

Risk

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant