Skip to content

Enhance YAML parser with flow collection preprocessing and tag handling#2

Merged
rjrodger merged 11 commits intomainfrom
claude/yaml-parser-compliance-review-SfJH0
Mar 12, 2026
Merged

Enhance YAML parser with flow collection preprocessing and tag handling#2
rjrodger merged 11 commits intomainfrom
claude/yaml-parser-compliance-review-SfJH0

Conversation

@rjrodger
Copy link
Contributor

Summary

This PR significantly improves YAML parsing capabilities by adding comprehensive flow collection preprocessing and proper TAG directive handling. The changes enable the parser to handle YAML-specific features that Jsonic's core parser doesn't natively support, while also fixing edge cases in block scalar and multiline key handling.

Key Changes

Flow Collection Preprocessing

  • Added preprocessFlowCollections() function to transform flow collection syntax into Jsonic-compatible format before parsing
  • Handles implicit null-valued keys in flow mappings: {a, b: c}{a: ~, b: c}
  • Strips comments between keys and colons in flow context
  • Folds multiline quoted scalars into single lines
  • Supports explicit key indicators (?) inside flow collections
  • Converts [? key : val] syntax to [{key: val}] for proper parsing

TAG Directive Support

  • Added tagHandles map to track %TAG directive mappings
  • Parses and stores TAG directives before document processing
  • Skips built-in type conversion (int, float, bool, null) when !! is redefined by a custom TAG directive
  • Allows users to override YAML's default type tags with custom handlers

Block Scalar Key Handling

  • Added support for block scalar indicators (| and >) in explicit keys
  • Properly handles chomping indicators (+, -) and explicit indentation
  • Correctly processes folded vs. literal block scalars in key position
  • Tracks extra rows consumed for accurate position tracking

Alias Resolution Improvements

  • Resolves aliases immediately when anchors are available, preventing loss of markers through Jsonic's rule processing
  • Defers resolution only when anchor hasn't been seen yet
  • Properly handles alias values as map keys

Tag Processing Enhancements

  • Handles standalone tags (followed by newline) by consuming the newline and leading spaces
  • Prevents spurious indent tokens after standalone tags
  • Correctly identifies tag boundaries when colons appear in tag names

Sequence Marker Detection

  • Improved sequence marker (- ) detection to only treat as new entry when indent matches enclosing sequence level
  • Prevents false positives in continuation lines

Position Tracking Fixes

  • Fixed column index calculation for explicit keys with values
  • Properly tracks row increments for multiline keys and block scalars
  • Corrects continuation line indent calculation for map values

Test Suite Updates

  • Removed 14 test cases from SKIP list that now pass with these improvements
  • Tests cover flow collections, block scalars, tags, aliases, and complex nesting scenarios

https://claude.ai/code/session_01H3rUS9E1u5eXZrzyYiBPB3

claude added 11 commits March 12, 2026 09:39
Per YAML spec, anchor and alias names can contain colons and other
special characters. Only terminate alias names at colon when followed
by space/tab (key-value separator context). Also resolve aliases
immediately in the lexer when the anchor exists, since deferred
markers can be lost through Jsonic's rule processing for indented values.

https://claude.ai/code/session_01H3rUS9E1u5eXZrzyYiBPB3
Parse %TAG directives during source cleanup and store handle→prefix
mappings. When !! has been redefined by %TAG, skip built-in type
conversion (!!int, !!float, etc.) and treat the value as a plain string.

https://claude.ai/code/session_01H3rUS9E1u5eXZrzyYiBPB3
When a block scalar indicator (|2, >1) appears on a separate line from
the mapping colon (e.g., after a standalone tag like !foo), look backward
to find the parent mapping key's indent for correct blockIndent calculation.
Also consume trailing newline after standalone local tags to prevent
extra #IN tokens.

https://claude.ai/code/session_01H3rUS9E1u5eXZrzyYiBPB3
When a continuation line starts with "- ", check whether its indent
matches the nearest enclosing sequence marker's indent. Only treat
it as a new sequence entry at the matching indent level — at other
indents, it's plain scalar text continuation.

https://claude.ai/code/session_01H3rUS9E1u5eXZrzyYiBPB3
The explicit key handler set pnt.cI = 1 after processing ": value",
causing incorrect column info for the element marker. This broke
multi-entry sequences as values of explicit keys. Fix by computing
the actual column position on the value line.

https://claude.ai/code/session_01H3rUS9E1u5eXZrzyYiBPB3
…, JTV5)

- Handle block scalar indicators (| and >) as explicit key content,
  parsing indented continuation lines as literal/folded block scalar text
- Fix keyIndent calculation in text.check to use current line indent
  (colon's line) instead of previous line indent for map value continuation

https://claude.ai/code/session_01H3rUS9E1u5eXZrzyYiBPB3
…KB6, 9BXH, CT4Q, K3WX)

Add flow collection preprocessor to handle YAML-specific features that
Jsonic's core parser doesn't natively support:
- Implicit null-valued keys in flow mappings: {a, b: c} → {a: ~, b: c}
- Comments between key and colon in flow context
- Multiline quoted scalars in flow collections
- Explicit keys (?) inside flow sequences

All 374 YAML Test Suite tests now pass with 0 skipped.

https://claude.ai/code/session_01H3rUS9E1u5eXZrzyYiBPB3
Implements a Go version of the @jsonic.dev/yaml parser as a Jsonic plugin,
following the same architecture as the CSV Go plugin. Includes:

- yaml.go: Public API (Parse, MakeJsonic), helper functions, YAML value maps
- plugin.go: Yaml plugin with custom matcher, TextCheck for block scalars/
  tags/plain scalars, anchor/alias handling, quoted strings, indentation
- grammar.go: YAML-specific grammar rules for block mappings, sequences,
  indent-based nesting, element markers, merge keys
- yaml_test.go: 93 tests covering block mappings, sequences, scalar types,
  quoted strings, block scalars, flow collections, comments, anchors/aliases,
  merge keys, documents, tags, complex keys, directives, indentation,
  multiline scalars, CRLF, and real-world patterns

https://claude.ai/code/session_01H3rUS9E1u5eXZrzyYiBPB3
Runs build and test on ubuntu, windows, and macos for both
the TypeScript (Node 24) and Go (1.24) implementations.

https://claude.ai/code/session_01H3rUS9E1u5eXZrzyYiBPB3
On Windows, readFileSync returns \r\n line endings which cause the
YAML parser to fail on certain inputs (e.g. LE5A with trailing empty
lines after !!str). Normalize to \n when reading in.yaml test files.

https://claude.ai/code/session_01H3rUS9E1u5eXZrzyYiBPB3
@rjrodger rjrodger merged commit f1a79e1 into main Mar 12, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants