Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 55 additions & 1 deletion .cognition/skills/debug-exiftool/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,16 @@ cd perl5_t/t
../../jperl op/lexsub.t
```

### Running Perl5 core tests that use subprocess tests
Tests using `run_multiple_progs()` or `fresh_perl_is()` spawn `jperl` as a subprocess. This requires `jperl` to be in PATH:
```bash
# Using the test runner (handles PATH automatically):
perl dev/tools/perl_test_runner.pl perl5_t/t/op/eval.t

# Manual running (must set PATH):
PATH="/Users/fglock/projects/PerlOnJava2:$PATH" cd perl5_t/t && ../../jperl op/eval.t
```

## Comparing with System Perl

When debugging, compare PerlOnJava output with native Perl to isolate the difference:
Expand Down Expand Up @@ -285,7 +295,7 @@ Key files for the interpreter:

## Current Test Status (as of 2026-03-03)

### ExifTool Test Results: 524/600 planned (87%)
### ExifTool Test Results: 590/600 planned (98%)

| Test | Pass/Planned | Status |
|------|-------------|--------|
Expand Down Expand Up @@ -358,6 +368,10 @@ Various format-specific write issues. Many may share root causes with P1 (mandat
| Dynamic variables | `runtime/runtimetypes/DynamicVariableManager.java` |
| IO operations | `runtime/runtimetypes/RuntimeIO.java` |
| IO operator (open/dup) | `runtime/operators/IOOperator.java` |
| Control flow (goto/labels) | `backend/jvm/EmitControlFlow.java` |
| Dereference / slicing | `backend/jvm/Dereference.java` |
| Variable emission (refs) | `backend/jvm/EmitVariable.java` |
| String parser (qw, heredoc) | `frontend/parser/StringParser.java` |
| String operators | `runtime/operators/StringOperators.java` |
| Pack/Unpack | `runtime/operators/PackOperator.java` |
| Regex preprocessor | `runtime/regex/RegexPreprocessor.java` |
Expand Down Expand Up @@ -385,6 +399,46 @@ If a fix only patches ONE of these paths (e.g., `capturedVarIndices` check in `v
### Ordering matters for capturedVars
`SubroutineParser` builds `paramList` by iterating `getAllVisibleVariables()` (TreeMap sorted by register index) with specific filters. `detectClosureVariables()` must use the **exact same iteration order and filters**. Any mismatch causes captured variable values to be assigned to wrong registers at runtime.

### goto LABEL across JVM scope boundaries
`EmitControlFlow.handleGotoLabel()` resolves labels at compile time within the current JVM scope. When the target label is outside the current scope (e.g., goto inside a `map` block to a label outside, or goto inside an `eval` block), the compile-time lookup fails. The fix is to emit a `RuntimeControlFlowList` marker with `ControlFlowType.GOTO` at runtime (the same mechanism used by dynamic `goto EXPR`), allowing the goto signal to propagate up the call stack. This was a blocker for both op/array.t and op/eval.t.

### List slice with range indices
In `Dereference.handleArrowArrayDeref()`, the check for single-index vs slice path must account for range expressions (`..` operator). A range like `0..5` is a single AST node but produces multiple indices. The correct condition is: use single-index path only if there's one element AND it's not a range. Otherwise, use the slice path. The old code had a complex `isArrayLiteral` check that was too restrictive.

### qw() backslash processing
`StringParser.parseWordsString()` must apply single-quote backslash rules to each word: `\\` → `\` and `\delimiter` → `delimiter`. Without this, backslashes are doubled in the output. The processing uses the closing delimiter from the qw construct.

### `\(LIST)` must flatten arrays before creating refs
`\(@array)` should create individual scalar refs to each array element (like `map { \$_ } @array`), not a single ref to the array. `EmitVariable` needs a `flattenElements()` method that detects `@` sigil nodes in the list and flattens them before creating element references.

### Squashing a diverged branch with `git diff` + `git apply`
When a feature branch has diverged far from master (thousands of commits in common history), both `git rebase` and `git merge --squash` can produce massive conflicts across dozens of files. The clean workaround:
```bash
# 1. Generate a patch of ONLY the branch's changes vs master
git diff master..feature-branch > /tmp/branch-diff.patch
# 2. Create a fresh branch from current master
git checkout master && git checkout -b feature-branch-clean
# 3. Apply the patch (no merge history = no conflicts)
git apply /tmp/branch-diff.patch
# 4. Commit as a single squashed commit
git add -A && git commit -m "Squashed: ..."
# 5. Force push to update the PR
git push --force origin feature-branch-clean
```
This works because `git diff master..branch` produces the exact file-level delta, bypassing all the intermediate merge history that causes conflicts.

### Always commit fixes before rebasing
Uncommitted working tree changes are lost when `git rebase --abort` is run. If you have a fix in progress (e.g., a BitwiseOperators change), commit it first — even as a WIP commit — before attempting any rebase. The rebase abort restores the branch to its pre-rebase state, which does NOT include uncommitted changes.

### `getInt()` vs `(int) getLong()` for 32-bit integer wrapping
`RuntimeScalar.getInt()` clamps DOUBLE values to `Integer.MAX_VALUE` (e.g., `(int) 2147483648.0 == 2147483647`). But `(int) getLong()` wraps correctly via long→int truncation (e.g., `(int) 2147483648L == -2147483648`). For `use integer` operations where Config.pm reports `ivsize=4`, always use `(int) getLong()` to get proper 32-bit wrapping behavior matching Perl's semantics.

### scalar gmtime/localtime ctime(3) format
Perl's scalar `gmtime`/`localtime` returns ctime(3) format: `"Fri Mar 7 20:13:52 881"` — NOT RFC 1123 (`"Fri, 7 Mar 0881 20:13:52 GMT"`). Use `String.format()` with explicit field widths, not `DateTimeFormatter`. Also: wday must use `getValue() % 7` (Perl: 0=Sun..6=Sat) not `getValue()` (Java: 1=Mon..7=Sun). Large years (>9999) must not crash the formatter.

### Regression testing: always compare branch vs master
Before declaring a fix complete, run the same test on both master and the branch to distinguish real regressions from pre-existing failures. Use `perl5_t/t/` (not `perl5/t/`) for running Perl5 core tests — the `perl5_t` copy has test harness files (`test.pl`, `charset_tools.pl`) that PerlOnJava can load.

## Adding Debug Instrumentation

In ExifTool Perl code (temporary, never commit):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,23 +69,20 @@ JSR-223 is the standard Java scripting API, available since Java 6. It allows bi

**`perlonjava-3.0.0.jar`** — 25 MB, zero external dependencies

**Same JAR runs on Linux, macOS, and Windows** — no recompilation.

```
perlonjava.jar
├── org/perlonjava/ ← 392 Java compiled classes
├── lib/ ← 341 Perl modules (DBI, JSON, HTTP::Tiny…)
├── runtime/nativ/ ← Platform abstraction (POSIX ↔ Win32 via JNA)
├── ASM, ICU4J, JNA ← Java libraries bundled
└── META-INF/services ← JSR-223 auto-discovery
├── org/perlonjava/ ← 392 Java compiled classes
├── lib/ ← 341 Perl modules (DBI, JSON, HTTP::Tiny…)
├── ASM, ICU4J, JNA ← Java libraries bundled
└── META-INF/services ← JSR-223 auto-discovery
```

`java -jar perlonjava.jar script.pl` — or `./jperl` / `jperl.bat`
`java -jar perlonjava.jar script.pl` — that's it.

Also ships as: **Debian package** (`make deb`) · **Docker image** (`docker build -t perlonjava .`)
Or use `./jperl script.pl` — a wrapper that also supports `$CLASSPATH` for JDBC drivers.

Note:
Built with Gradle Shadow plugin (fat JAR). Perl modules live in src/main/perl/lib and are packaged as resources inside the JAR. The require mechanism reads them directly from the JAR via classloader. The nativ/ package uses JNA to call POSIX libc on Unix and Kernel32 on Windows — same Perl code, platform-specific native calls handled transparently. The jperl wrapper uses -cp instead of -jar so users can add extra JARs to CLASSPATH. Docker: `docker build -t perlonjava .` then `docker run perlonjava script.pl`. Debian: `make deb` creates a .deb in build/distributions/, install with `sudo dpkg -i`.
Built with Gradle Shadow plugin (fat JAR). Perl modules live in src/main/perl/lib and are packaged as resources inside the JAR. The require mechanism reads them directly from the JAR via classloader. No installation, no CPAN, no paths to configure. The jperl wrapper uses -cp instead of -jar so users can add extra JARs to CLASSPATH.

---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -592,11 +592,10 @@ Also: globalIORefs → IO, globalFormatRefs → FORMAT. Slot access: *foo{CODE}

- Loads **Java extensions** instead of C shared libraries
- **JNA** (Java Native Access) replaces XS for native calls
- `nativ/` package: POSIX libc on Unix, Kernel32 on Windows — **same JAR, all platforms**
- No C compiler needed

Note:
The nativ/ package provides cross-platform implementations of symlink, link, getppid, getuid/gid, chmod, chown, kill, and more. NativeUtils detects the OS at startup and routes each call to the appropriate native API. ExtendedNativeUtils adds user/group info, network ops, and System V IPC. Java equivalents are easier to write and maintain than C/XS.
Java equivalents are easier to write and maintain than C/XS. The same API surface is exposed to Perl code.

---

Expand Down
39 changes: 17 additions & 22 deletions src/main/java/org/perlonjava/backend/bytecode/BytecodeCompiler.java
Original file line number Diff line number Diff line change
Expand Up @@ -2138,28 +2138,6 @@ void compileVariableDeclaration(OperatorNode node, String op) {
continue;
}

// local @x / local %x in list form
if ((sigil.equals("@") || sigil.equals("%")) && sigilOp.operand instanceof IdentifierNode idNode) {
String varName = sigil + idNode.name;
if (hasVariable(varName)) {
throwCompilerException("Can't localize lexical variable " + varName);
}

String globalVarName = NameNormalizer.normalizeVariableName(idNode.name, getCurrentPackage());
int nameIdx = addToStringPool(globalVarName);

int rd = allocateRegister();
if (sigil.equals("@")) {
emitWithToken(Opcodes.LOCAL_ARRAY, node.getIndex());
} else {
emitWithToken(Opcodes.LOCAL_HASH, node.getIndex());
}
emitReg(rd);
emit(nameIdx);
varRegs.add(rd);
continue;
}

if (sigilOp.operand instanceof IdentifierNode) {
String varName = sigil + ((IdentifierNode) sigilOp.operand).name;

Expand Down Expand Up @@ -3206,7 +3184,10 @@ void compileVariableReference(OperatorNode node, String op) {
BlockNode block = (BlockNode) node.operand;

// Check strict refs at compile time — mirrors JVM path in EmitVariable.java
int savedCtx = currentCallContext;
currentCallContext = RuntimeContextType.SCALAR;
block.accept(this);
currentCallContext = savedCtx;
int blockResultReg = lastResultReg;
int rd = allocateRegister();
if (isStrictRefsEnabled()) {
Expand Down Expand Up @@ -3336,7 +3317,10 @@ void compileVariableReference(OperatorNode node, String op) {
// @{ block } - evaluate block and dereference the result
// The block should return an arrayref
BlockNode blockNode = (BlockNode) node.operand;
int savedCtx = currentCallContext;
currentCallContext = RuntimeContextType.SCALAR;
blockNode.accept(this);
currentCallContext = savedCtx;
int refReg = lastResultReg;

// Dereference to get the array
Expand Down Expand Up @@ -3429,7 +3413,10 @@ void compileVariableReference(OperatorNode node, String op) {
}
} else if (node.operand instanceof BlockNode blockNode) {
// %{ block } — evaluate block and dereference to hash
int savedCtx = currentCallContext;
currentCallContext = RuntimeContextType.SCALAR;
blockNode.accept(this);
currentCallContext = savedCtx;
int scalarReg = lastResultReg;
int hashReg = allocateRegister();
if (isStrictRefsEnabled()) {
Expand Down Expand Up @@ -3973,6 +3960,10 @@ private void visitNamedSubroutine(SubroutineNode node) {
// Sub-compiler will use RETRIEVE_BEGIN opcodes for closure variables
InterpretedCode subCode = subCompiler.compile(node.block);

if (RuntimeCode.DISASSEMBLE) {
System.out.println(subCode.disassemble());
}

// Step 5: Emit bytecode to create closure or simple code ref
int codeReg = allocateRegister();

Expand Down Expand Up @@ -4060,6 +4051,10 @@ private void visitAnonymousSubroutine(SubroutineNode node) {
// Sub-compiler will use parentRegistry to resolve captured variables
InterpretedCode subCode = subCompiler.compile(node.block);

if (RuntimeCode.DISASSEMBLE) {
System.out.println(subCode.disassemble());
}

// Step 5: Create closure or simple code ref
int codeReg = allocateRegister();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1043,14 +1043,20 @@ public static RuntimeList execute(InterpretedCode code, RuntimeArray args, int c

case Opcodes.HASH_SET: {
// Hash element store: hash{key} = value
// Must copy the value into a new scalar for the hash element,
// because the source register may be modified in-place later
// (e.g. $hash{k} = $fix; $fix = {} would clear $hash{k} otherwise)
// Uses addToScalar to properly resolve special variables ($1, $2, etc.)
int hashReg = bytecode[pc++];
int keyReg = bytecode[pc++];
int valueReg = bytecode[pc++];
RuntimeHash hash = (RuntimeHash) registers[hashReg];
RuntimeScalar key = (RuntimeScalar) registers[keyReg];
RuntimeBase valBase = registers[valueReg];
RuntimeScalar val = (valBase instanceof RuntimeScalar) ? (RuntimeScalar) valBase : valBase.scalar();
hash.put(key.toString(), ensureMutableScalar(val));
RuntimeScalar copy = new RuntimeScalar();
val.addToScalar(copy);
hash.put(key.toString(), copy);
break;
}

Expand Down Expand Up @@ -2340,6 +2346,7 @@ public static RuntimeList execute(InterpretedCode code, RuntimeArray args, int c
case Opcodes.VEC:
case Opcodes.LOCALTIME:
case Opcodes.GMTIME:
case Opcodes.RESET:
case Opcodes.CRYPT:
case Opcodes.CLOSE:
case Opcodes.BINMODE:
Expand Down
Loading