Skip to content

feat(fetchers): RSSFeedFetcher — structured feed parsing #59

@chaliy

Description

@chaliy

What

Add an RSSFeedFetcher that detects RSS/Atom feeds (via content-type or XML structure) and returns structured feed entries.

Why

Agents monitoring blogs, release feeds, changelogs, and news sources encounter RSS/Atom feeds. The current DefaultFetcher returns raw XML, which is hard for LLMs to parse effectively. A dedicated fetcher provides clean, structured entries.

Requirements

  • Detect via content-type: application/rss+xml, application/atom+xml, text/xml (with RSS/Atom root element)
  • Parse both RSS 2.0 and Atom 1.0 formats
  • Return: feed title, description, link, last updated
  • For each entry: title, link, published date, summary/description, author, categories
  • Format field: "rss_feed"
  • Limit entries to most recent N (configurable, default ~20)
  • Preserve HTML content in entry descriptions via existing html_to_markdown conversion

Design Notes

  • XML parsing needed — consider quick-xml crate (same as potential ArXiv dependency)
  • Content-type detection should happen early in the fetch pipeline
  • Could also detect RSS feed links in HTML pages (<link rel="alternate" type="application/rss+xml">) and offer to follow them
  • This fetcher detects by content-type rather than URL pattern — slightly different matching strategy

Tier

3 — Differentiated capability

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions