Skip to content

feat: ensure ffi converts to markdown#7

Merged
TheOutdoorProgrammer merged 1 commit into
mainfrom
export-html-conversion-to-markdown
Dec 28, 2025
Merged

feat: ensure ffi converts to markdown#7
TheOutdoorProgrammer merged 1 commit into
mainfrom
export-html-conversion-to-markdown

Conversation

@TheOutdoorProgrammer
Copy link
Copy Markdown
Contributor

Convert FFI to Return Markdown Instead of HTML

This PR updates the Convert FFI function to return Markdown output instead of cleaned HTML, making the library more useful for consuming applications that need Markdown format.

Motivation

The previous implementation returned sanitized HTML after stripping specified elements. However, most consuming applications (Python, Node.js, etc.) likely need Markdown format rather than HTML for display, storage, or further processing. By converting to Markdown in the FFI layer, we provide more useful output and leverage the existing HTMLToMarkdown converter functionality.

Changes

  • Modified Convert function in cmd/libschema/main.go to call converter.HTMLToMarkdown() on the processed HTML before returning
  • Fixed //export directive placement - moved it immediately before the function declaration (required for proper CGO export)
  • Enhanced function documentation to clarify the CGO array passing pattern
  • Updated Node.js FFI test to check for content presence ('This content should remain') instead of HTML tags ('<main>')
  • Updated Python FFI test with the same content-based assertion

Features

  • Markdown Output: The FFI now returns Markdown format instead of HTML, making it more useful for documentation systems, static site generators, and content management applications
  • Graceful Fallback: If Markdown conversion fails, the function returns the original HTML input rather than failing completely
  • Element Stripping Still Works: The selective element removal functionality is preserved and happens before Markdown conversion

Usage

The function signature remains unchanged - consuming applications don't need code modifications. However, the returned string is now Markdown format:

// Node.js example
const result = convert(html, ['footer', 'nav']);
// result is now Markdown instead of HTML
# Python example
result = lib.Convert(html_bytes, elements_array, len(elements))
# result is now Markdown instead of HTML

Applications expecting HTML output should be updated to handle Markdown, or a separate HTML-returning function should be added if both formats are needed.

@TheOutdoorProgrammer TheOutdoorProgrammer merged commit 7fbdb50 into main Dec 28, 2025
3 checks passed
@TheOutdoorProgrammer TheOutdoorProgrammer deleted the export-html-conversion-to-markdown branch December 28, 2025 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant