feat: HWPX (한글/Hancom) document handler#54
Conversation
|
Hi @lidge-jun, thanks for the incredible effort here — a full HWPX handler with 180 tests is no small feat, and we appreciate your interest in expanding OfficeCLI's format coverage. After reviewing the PR, we've decided not to merge this into Plugin-based approach for HWPX Contribution guidelines Next steps Thanks again for the work — looking forward to collaborating! 안녕하세요 @lidge-jun 님, 먼저 이렇게 큰 작업을 해주셔서 정말 감사합니다. 180개의 테스트를 포함한 HWPX 핸들러 전체 구현은 결코 쉬운 작업이 아닌데, OfficeCLI의 포맷 확장에 관심을 가져주셔서 감사드립니다. 검토 결과, 현 시점에서는 HWPX는 플러그인 방식으로 지원 예정 외부 기여 가이드라인 관련 앞으로의 방향 다시 한번 수고해 주신 점 감사드립니다. 앞으로의 협업을 기대하겠습니다! |
aa992bf to
3ce806a
Compare
3b0d6dd to
6534bdc
Compare
Establishes the contract for sidecar plugins that extend officecli format support. Defines three plugin kinds (dump-reader, exporter, format-handler), named-pipe IPC framing, manifest schema via --info, four discovery paths, error/exit code conventions, and main-to-plugin stability commitments. Motivated by PR #54 (HWPX) needing a format-handler path, legacy .doc support needing license-flexible isolation, and PDF export needing to stay out of the main binary's dependency tree. v0 draft — subject to revision before v1 ratification.
The earlier fix re-anchored crossBetween in ChartHelper.Axis's axis-role setter, but the chart-level `crossbetween` case in ChartHelper.Setter still used AppendChild and landed crossBetween after majorUnit / minorUnit. Dumped charts that go through the chart-level Set path (bar, column) replayed with an out-of-order valAx that PowerPoint silently rejected. Apply the same anchor chain (CrossesAt → Crosses → CrossingAxis → InsertAfterSelf) on the chart-level path so both entry points land crossBetween in the right position regardless of whether the source build emitted majorUnit before the cb set.
`add chart --prop series1.outlineColor=#FF0000` (and outlineWidth / outlineDash) silently dropped because DeferredSeriesSubkeys only listed the compound `outline` key. IsDeferredKey returned false for the dotted subkey form, so the property was handled before the series element existed in the chart XML, then discarded. Add `outlinecolor`, `outlinewidth`, `outlinedash` to DeferredSeriesSubkeys so they route through HandleSeriesDottedProperty after build, matching the existing dotted-subkey contract.
ConnectorToNode had per-key readback for outline width, dash, solid color, and alpha, but no branch for <a:gradFill> on the connector's <a:ln>. A connector authored with a gradient line (or one whose gradient was applied via `set --prop line.gradient=…`) round-tripped to a bare <a:ln/> on dump→batch replay because the spec key never surfaced in Get; replay then fell back to the theme's default thin stroke. Read the GradientFill via ReadGradientString and emit `line.gradient`, mirroring the existing shape-outline gradient readback.
A series authored with `trendline=poly:3` or `trendline=movingAvg:5` round-tripped to bare `poly` / `movingAvg` on Get and dump, silently dropping the polynomial degree and moving-average window size. The reader only emitted <c:trendlineType val>; the adjacent <c:order val> and <c:period val> children were ignored. Format the spec via FormatTrendlineSpec, which appends `:N` for poly (from <c:order>) and movingAvg (from <c:period>). Other trendline types (linear, log, exp, power) keep their bare-name spec. Applied at both the chart-level trendline summary and the per-series readback so dump→batch replay rebuilds the exact curve.
ReadGradientSpec only appended the `:angle` suffix when the linear gradient's angle was non-zero, so a series gradient authored as `gradient=A5A5A5-D0D0D0:0` round-tripped to bare `A5A5A5-D0D0D0` — the explicit zero-degree (left→right) direction collapsed into "no angle specified", letting the writer fall back to the schema default on the next round-trip. Emit `:0` whenever <a:lin ang=…> is present, regardless of value. The angle is a meaningful user choice (0 = horizontal, 90 = vertical, …) and must survive dump→batch replay verbatim.
Shape gradients use `C1-C2-ANGLE` while chart series gradients use `C1-C2:ANGLE`. Users routinely mix the two — feeding a chart-style spec into a shape (`fill=FF0000-00FF00:45` or `gradient=…:90`) parsed the colon as part of the second color token, then the parser either rejected the value or silently dropped the angle. Normalize a trailing `:N` (integer in [-360, 360]) to `-N` at the front of NormalizeGradientValue when the input is not already a typed prefix (radial:/path:/linear;). Get/dump still emit each surface's canonical separator unchanged, so existing round-trips and the schema contract are untouched.
…pec call InnerText is a nullable reference type; pass "" when null so the non-nullable parameter contract holds. The guarding HasValue check ensures Val itself is non-null, but does not propagate to InnerText.
…idline booleans Bug: charts-* dump→replay lost minorGridlineColor / minorGridlineWidth on 10/14 chart files. Two compounding defects in the chart-axis emit step. Root cause 1 (path): EmitChartAxesIfDifferent emitted `set /slide[N]/chart[@id=X]/axis[@ROLE=Y]` using whichever path the live chart's Get returned (`@id=X` when cNvPr.Id is present). On replay the chart is freshly added and gets a NEW cNvPr.Id — the original X no longer matches. The chart-axis Set regex only accepts positional `chart[(\d+)]`, so every @id= axis-set step silently failed to dispatch. Root cause 2 (semantic overlap): the axis emit re-emitted `minorGridlines=true` / `majorGridlines=true` alongside the chart-level `add` row that already carried `minorGridlines=true` PLUS `minorGridlineColor=#F0F0F0` and `minorGridlineWidth=0.25`. Once the path bug was fixed and the set step actually ran, ChartHelper.Setter's "minorgridlines" case does `valAxis.RemoveAllChildren<MinorGridlines>()` and re-adds a bare one — nuking the color/width siblings the chart-level add had just installed. Fix: pass a positional chart ordinal (already tracked as `ord["chart"]` in the slide-children walk) into EmitChart and use it to build a separate `replayChartPath = /slide[N]/chart[K]` for the emitted set row. The live `@id=X` path is still used for reading axes off the source file. Drop majorGridlines/minorGridlines booleans from axis emit entirely — the chart-level keys round-trip the full gridline state including color and width without going through the destructive RemoveAllChildren path. Verified: dump→replay of charts-radar.pptx preserves minorGridlineColor #F0F0F0 and minorGridlineWidth 0.25 on slide[4]/chart[2].
Bug: chart dump→replay dropped ALL trendlines off every series. The
per-series flatten loop in EmitChart enumerated color/lineWidth/lineDash/
marker/etc. but never included "trendline", "trendline.dispRSqr",
"trendline.dispEq". Reader emits these on series.Format and the chart
Setter accepts series{N}.trendline / series{N}.trendline.dispRSqr /
series{N}.trendline.dispEq — only the emit side was incomplete.
Fix: add the three keys to the flatten list. Per-series trendline state
now round-trips on dump→replay.
Bug: Set already accepted line.gradient on both connector and shape (routed through BuildGradientFill into the outline), but Add silently dropped it. On dump→replay the chart-emit step output an `add shape` or `add connector` row carrying line.gradient and replay either lost the gradient entirely (connector — solidFill from the line= fallback parsed line.gradient's "color1-color2" string as a single bogus color) or surfaced unsupported_property (shape — key not in effectKeys allowlist). Fix: - AddConnector: check line.gradient / linegradient BEFORE the solid-color fallback chain. Build the GradientFill via the same BuildGradientFill helper Set uses, append to the outline. - AddShape effectKeys: add "linegradient" / "line.gradient" so the key routes through SetRunOrShapeProperties, which already handles it. Verified: add connector and add shape with line.gradient=red-blue both round-trip through Get back to linear;#FF0000;#0000FF;90.
Bug: passing series{N}.color (or the colors= shorthand) on
`chartType=scatter` was silently dropped. BuildBarChart, BuildLineChart,
BuildRadarChart, BuildBubbleChart all take a `string[]? colors` parameter
and ApplySeriesColor per index; BuildScatterChart did not — it called
BuildScatterSeries which only sets x/y values, leaving the series to
inherit the theme's default series color.
Fix: thread `colors` through BuildScatterChart's call from the chart-type
switch (line 177) and apply per-series via the same ApplySeriesColor
helper bubble/radar use. Default-color path is unchanged (when no colors
supplied, theme palette wins as before).
Verified: add chart chartType=scatter with series1.color=#FF0000 /
series2.color=#00FF00 → Get reports both series colors back.
Bug: series{N}.lineDash=sysDashDot (and the longer sysDashDotDot,
lgDash, lgDashDot, lgDashDotDot, plus their snake_case _dot_dot
variant) silently mapped to Solid. The Reader emits the OOXML-native
camelCase form (PresetDash.Val.InnerText is the schema name
"sysDashDot"), so dump→replay rounded every dotted/long-dash line back
to a solid one.
ParseDashStyle's switch lowercases input then matched only "sysdash_dot"
/ "longdash*" — neither survived ToLowerInvariant() of "sysDashDot" or
"lgDashDot".
Fix:
- ParseDashStyle: add lowercase OOXML forms (sysdashdot, sysdashdotdot,
lgdash, lgdashdot, lgdashdotdot) and the missing
SystemDashDotDot enum mapping. Snake_case variants retained for
backwards compat.
- IsKnownDashStyle (Advanced.cs referenceLine heuristic): mirror the
same set so a numeric-label string that happens to contain a dash
name still distinguishes a style token from a label.
Verified: add chart series1.lineDash=sysDashDot round-trips through
Get back to lineDash: sysDashDot (not solid).
Bug: a series carrying multiple trendlines (Excel allows e.g. linear + polynomial together) round-tripped to just the first. The Reader called GetFirstChild<C.Trendline>() so the second/third trendline never reached seriesNode.Format. Even after Reader saw them, the Setter's trendline= case only built one element from the supplied string. Fix: - Reader: enumerate Elements<C.Trendline>() per series; emit FormatTrendlineSpec for each and join with ';' (the spec itself already uses ':' for poly:N / movingAvg:N so ';' is unambiguous). dispRSqr / dispEq mirror the FIRST trendline's display flags (chart-level fan-out targets every trendline anyway). - SetterHelpers per-series `trendline` case: split semicolon-joined value and append-or-replace each spec, retaining the existing single-spec append-or-replace behaviour for inputs without ';'. - Setter chart-level `trendline` case: mirror the same split so the chart-wide fan-out reinstalls every trendline on every series. Verified: a line series with linear + exp survives dump→batch replay and reads back as `trendline: linear;exp`.
…t drop Bug: ReadGradientSpec returned null when a series gradient had fewer than two stops, dropping the gradient entirely on dump. ApplySeriesGradient had the symmetric blind spot — it returned early when the spec parsed to fewer than two colors. A 1-stop gradient is a corner-case (Excel / PowerPoint normally seed two stops) but does occur in hand-edited and third-party files; silently losing it on round-trip is wrong. Fix: - ReadGradientSpec: emit when stops.Count >= 1 (was >= 2). A single stop becomes a single colour token; only zero stops still return null. - ApplySeriesGradient: drop empty tokens, treat a single colour as a duplicate-pair so the resulting GradientFill has >=2 stops (Drawing schema requires it for valid stop positions) and renders as a uniform fill — visually equivalent to a solidFill of that colour. Behaviour for the common 2-3 stop case is unchanged.
…ashDot OOXML PresetLineDashValues has both `dashDot` and `sysDashDot` as distinct dash presets. The parser previously collapsed `dashDot`, `sysdash_dot`, and `sysdashdot` all to SystemDashDot, so a chart line authored with `line.dash=dashDot` round-tripped as the system variant and rendered with the wrong on-paper pattern. Split the aliases: - `dashdot` -> DashDot - `sysdashdot` / `sysdash_dot` -> SystemDashDot OOXML schema has no `dashDotDot`, only `sysDashDotDot` and the long-dash variants -- those remain unchanged.
…allback BuildTrendline accepted `type:order:backward` colon-separated specs. An unrecognized type (e.g. user-supplied `linear|exp` pipe-list, expecting per-series fan-out) silently fell through the switch's default arm to TrendlineValues.Linear, so a chart authored with two different trendlines collapsed to one Linear trendline with no warning. Make `linear` an explicit arm and throw CliException(invalid_value) on unknown types, listing the valid set and pointing at the per-series form `seriesN.trendline=...` for mixed-type cases.
… axis SetAxisProperties had no direct-apply branch for majorUnit / minorUnit, so a role-scoped `set /chart/axis[@ROLE=value2] --prop majorUnit=N` forwarded to the chart-level `majorunit` case in SetChartProperties. That handler resolves the target via plotArea.GetFirstChild<ValueAxis>(), which is always the primary axis. The Set returned success while the secondary axis kept its original tick interval and the primary's was silently overwritten. Mirror the existing min/max/crosses pattern: when role is value/value2, apply MajorUnit / MinorUnit directly to the resolved targetAxis via InsertValAxChildInOrder so the schema order under CT_ValAx is preserved. Non-value roles ignore the key (schema does not declare it on category/series).
EmitChart fans series-level Format keys out into props as `seriesN.trendline=...` so multi-series varied trendlines round-trip. The Reader also emits a chart-level summary key `trendline` pinned to the first series with a trendline. When both made it into the add row, replay called BuildTrendline TWICE on series 1 -- once via the chart-level shortcut, once via series1.trendline -- doubling its trendline collection. For a deck with series1=linear, series2=exp the chart-level key overwrote series1's spec with whichever type the scanner happened to see first. Drop chart-level trendline / trendline.dispRSqr / trendline.dispEq after fan-out when any series carries a trendline -- the per-series keys are sufficient and unambiguous.
The Reader emitted only the bare ErrorBarValueType (`percentage`, `fixedVal`) on each series and never surfaced the magnitude. The PptxBatchEmitter then never fanned `errbars` out as a `seriesN.errbars` prop, so dump→replay dropped the entire <c:errBars> element. Replaying a chart authored with `errbars=percent:10` opened with no error-bar shapes at all, and the chart-level Setter's all-series sweep on `errbars=percent` (without a magnitude) produced bars with the OOXML default value. Emit `series.errbars` as a `type:value` spec mirroring the BuildErrorBars input form: read magnitude from <c:val>, falling back to the first <c:plus>/<c:minus> NumericLiteral point. Normalize OOXML type names back to the input vocabulary (`percentage` -> `percent`, `stdDev` -> `stddev`, etc.). Add `errbars` to the per-series fan-out list in EmitChart so dump emits `seriesN.errbars=...` rows that the existing per-series Setter case rebuilds verbatim at replay.
Adds fixtureClassCoverage to HWP/HWPX manifests and corpus schema covering the eight Phase 36.2 required classes: multi-section, merged-cell-tables, nested-tables, pictures-bindata, headers-footers, equations, unicode-edge-cases, malformed-hwpx-package Each class is recorded as verified, blocked, or external-manual with typed reason for blocked classes. malformed-hwpx-package stays blocked with reason fixture_validation_failed. External-manual entries do not declare verifiedOperations and do not count toward capability evidence. New tests in HwpCompatibilityCorpusTests: RequiredFixtureClassesAreRepresented MalformedHwpxFixturesAreBlockedNotVerified ExternalManualFixturesDoNotCountAsVerified dotnet build officecli.slnx PASS HWP-filter tests 169 PASS full OfficeCli.Tests 210 PASS Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add roundtrip-case.v1.schema.json defining case shape with allOf conditional that forces blocked status to carry a typed error. - Add roundtrip-cases.json with 5 cases: HWP replace_text/fill_field/ set_table_cell via rhwp-bridge, HWPX replace_text via custom, and HWPX set_table_cell blocked with reason roundtrip_unverified. - Add HwpRoundTripCorpusTests with 5 invariants: CasesConformToSchema, MutationCasesLeaveSourceUnchanged, BlockedCasesReturnTypedErrors, SuccessCasesPassProviderReadback, RealRoundTripExecutionIsOptIn (gated on OFFICECLI_REAL_RHWP_BIN). - Cross-reference round-trip catalog from expected-capabilities.json and document it in docs/qa/compatibility-corpus.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add visual-thresholds.json declaring hard-fail conditions (page-count-mismatch, missing-expected-svg-page, missing-render-evidence-for-visual-validated-operation), thresholded-fail metrics, and the visual-validated operation list. - Add docs/qa/visual-diff-thresholds.md describing the policy and marking the renderer status as deferred. - Add HwpVisualDiffThresholdTests with 3 invariants: PageCountMismatchIsHardFail, MissingRenderEvidenceFailsVisualClaim, TextOnlyMutationUsesDeclaredThreshold. - Cross-reference visual policy from docs/qa/compatibility-corpus.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add provider-compatibility.json catalog with 12 rows covering HWP and HWPX operations across rhwp-bridge and custom providers, with typed blockedReason values and hancomLane=optional everywhere. - Add docs/qa/provider-compatibility-matrix.md describing schema, defaults, blocked-row policy, and the four enforcement tests. - Add HwpProviderCompatibilityMatrixTests with 4 invariants: HwpxCustomRemainsDefault, RhwpPromotionRequiresEvidenceParity, HancomLaneIsOptionalNotCiRequired, BlockedProviderRowsHaveTypedReasons. - Cross-reference matrix from docs/qa/compatibility-corpus.md and expected-capabilities.json. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add docs/qa/phase-36-release-gate.md with required artifacts, required tests, acceptance commands, allowed/forbidden claim text. - Cross-reference release gate from docs/qa/compatibility-corpus.md. - Extend HwpCompatibilityCorpusTests with three gate invariants: Phase36ReleaseGateRequiresAllCorpusArtifacts checks every required schema/manifest/doc/fixture exists and that fixtureClassCoverage is populated on each manifest; NoDocxParityLanguageBeforeScorecard scans all guarded docs and fails if 'DOCX parity' (or Korean/'parity with DOCX' variants) appears outside an explicit forbidden-claim/scorecard/must-not guard; BlockedOperationsRemainMachineReadable verifies every unsupported capability and provider matrix row carries a typed reason from the HwpCapabilityConstants enum. With 36.6 the Phase 36 plan is closed: round-trip harness complete, visual diff thresholds complete, provider compatibility matrix complete, release gate complete. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add structure/ docs (INDEX, AGENTS, file/function map, command reference, format support, providers, test infra) as the agent-facing source-of-truth for feat/hwpx. - Link structure/INDEX.md from README.md and add an experimental HWP/HWPX row to the format matrix (evidence-gated, no parity claim). - Rewrite BRANCH_STRATEGY.md to reflect feat/hwpx as the active branch and document the docs/structure-init-2026-05-06 fast-forward path. - Fix a garbled sentence in structure/03-format-support.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add IDocumentHandler.Save() to HwpxHandler (new upstream interface requirement) - Restore Run variable in WordHandler.Add.Text.cs (lost in conflict resolution) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Establishes the contract for sidecar plugins that extend officecli format support. Defines three plugin kinds (dump-reader, exporter, format-handler), named-pipe IPC framing, manifest schema via --info, four discovery paths, error/exit code conventions, and main-to-plugin stability commitments. Motivated by PR iOfficeAI#54 (HWPX) needing a format-handler path, legacy .doc support needing license-flexible isolation, and PDF export needing to stay out of the main binary's dependency tree. v0 draft — subject to revision before v1 ratification.
Establishes the contract for sidecar plugins that extend officecli format support. Defines three plugin kinds (dump-reader, exporter, format-handler), named-pipe IPC framing, manifest schema via --info, four discovery paths, error/exit code conventions, and main-to-plugin stability commitments. Motivated by PR #54 (HWPX) needing a format-handler path, legacy .doc support needing license-flexible isolation, and PDF export needing to stay out of the main binary's dependency tree. v0 draft — subject to revision before v1 ratification.
Summary
Hi! I'm a Korean developer building cli-jaw, a CLI orchestrator that relies heavily on OfficeCLI. First of all, thank you for this incredible tool — it's been invaluable for our document automation workflows.
Korean users frequently work with
.hwpxfiles (Hancom/한글 Office format, the de facto standard for government and business documents in Korea). This PR adds comprehensive HWPX support to OfficeCLI.This doesn't need to be merged into
mainright away. If you'd prefer, please feel free to merge it into afeat/hwpxbranch so we can iterate together before it's production-ready.What's included
--from-markdownoption forcreatecommandNotes
main(b5cd116)TryExtractBinaryis stubbed out (binary extraction not yet implemented)Happy to address any feedback. Would love to collaborate on making this production-ready!
Test plan
dotnet build— 0 errorsdotnet test— 180 passed, 0 failed