feat: Add Telegram file/image support with vision, OCR, and document-to-context pipeline by Copilot · Pull Request #31 · chinkan/RustFox

Copilot · 2026-03-25T10:16:42Z

Telegram only supported text messages. This adds handling for photos and file attachments (PDF, DOCX, images), routing them through a vision/OCR/document extraction pipeline before injecting context into the LLM.

Platform Layer

IncomingMessage gains attachments: Vec<Attachment> (kind: Image|Pdf|Docx|Other, path, mime_type)
telegram.rs now handles msg.photo() (highest-res variant) and msg.document(), downloading to per-request temp dirs cleaned up after processing

LLM Message Model (`llm.rs`)

ChatMessage.content promoted from Option<String> to MessageContent — either a plain string or Vec<ContentPart> (text/image_url)
Serializes as plain string when no images present (backward-compatible)

File Processor (`src/file_processor/`)

Images: if openrouter.supports_vision = true → base64-encode into multi-modal content parts; otherwise → OCR via ocrs (pure Rust, neural-network-based OCR engine — no system library dependencies) → inject as text
PDF: text extraction via pdf-extract
DOCX: text extraction via docx-rs
Long context (>6000 chars): text is chunked (1000-char chunks with overlap), stored via the existing EmbeddingEngine + sqlite-vec knowledge store, and top-K chunks are RAG-retrieved per user query — no new DB required

OCR

Uses ocrs — a pure Rust neural-network OCR engine backed by the rten inference runtime. Models (text detection + recognition) are downloaded automatically from S3 on first use and cached in ~/.cache/ocrs/. No native system library (e.g. Tesseract) required.

Config

[openrouter]
# supports_vision = false  # Set to true if your model supports image inputs

[ocr]
# model_dir = "~/.cache/ocrs"  # Where OCR model files are cached (downloaded on first use)

New Dependencies

Crate	Purpose
`ocrs`	OCR fallback for non-vision models (pure Rust, neural-network based)
`rten`	Neural network inference runtime for `ocrs` model files
`image`	Image loading/decoding
`pdf-extract`	PDF text extraction
`docx-rs`	DOCX parsing
`infer`	MIME type detection from bytes
`base64`	Base64 encoding for vision API image content parts

Original prompt

Use superpower to create a plan to add telegram file and image support, when receive file and image, store in temp and process supported format. For image, when default model is support visual, directly put the image to the model, else, research and find a best way in 2026 rust library ocr the image and put it in context to llm. For docx, pdf, research best method in rush for convert to markdown and also put as context to llm. If pdf or docx has image, extract also the image for answering questions, if the context too long, put it to local vector db (research best for vector in rust or use exists vector db) and pass user questions to vector and find result , also using exists embedded model chosen.

💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.

chinkan · 2026-03-25T10:30:15Z

@copilot use ocrs https://github.com/robertknight/ocrs as ocr image instead of terract

…rs OCR Co-authored-by: chinkan <16433287+chinkan@users.noreply.github.com> Agent-Logs-Url: https://github.com/chinkan/RustFox/sessions/9c502a20-dd1e-47bb-8b4d-34cf2f26c7f1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add MessageContent enum supporting Text(String) and Parts(Vec<ContentPart>) to replace Option<String> in ChatMessage.content. Backwards-compatible: Text variant serializes as plain JSON string, Parts as JSON array. - Add ContentPart, ImageUrlContent, MessageContent types to llm.rs - Update ChatMessage.content: Option<String> → Option<MessageContent> - Fix all usage sites in agent.rs, memory/conversations.rs, memory/query_rewriter.rs, memory/summarizer.rs, memory/rag.rs - DB storage continues to use extracted plain text via as_text() - All 113 tests pass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add `supports_vision` bool field to `OpenRouterConfig` (default false) - Add `OcrConfig` struct with `model_dir` (default $HOME/.cache/ocrs) - Add `ocr` field to `Config` with serde default - Add 3 tests: supports_vision defaults false, parses true, ocr default dir - Update config.example.toml with comments for new fields Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add src/file_processor/mod.rs with process_attachments() entry point - Image handling: base64 vision parts (supports_vision=true) or OCR via ocrs - OCR: auto-downloads text-detection/recognition .rten models on first use - PDF: text extraction via pdf_extract::extract_text_from_mem - DOCX: text extraction by traversing docx_rs document children - Long context (>6000 chars): chunk+store in MemoryStore, RAG-retrieve - Unit tests for chunk_text (4 cases) - Register module via mod file_processor; in src/main.rs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…comment on pdf unwrap_or_default Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Replace early-return for non-text messages with caption fallback - Add download_telegram_file helper (creates temp dir, fetches via Telegram API, detects MIME with infer crate) - Add classify_attachment_kind and mime_from_extension helpers - Collect photo and document attachments into Vec<Attachment> - Pass attachments to IncomingMessage instead of empty vec - Clean up temp dir after agent processing completes - Add 5 unit tests for classify_attachment_kind Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…match Co-authored-by: chinkan <16433287+chinkan@users.noreply.github.com> Agent-Logs-Url: https://github.com/chinkan/RustFox/sessions/9c502a20-dd1e-47bb-8b4d-34cf2f26c7f1

Copilot · 2026-03-25T11:21:56Z

@copilot use ocrs https://github.com/robertknight/ocrs as ocr image instead of terract

Switched OCR from tesseract to ocrs (https://github.com/robertknight/ocrs) — pure Rust, neural-network-based, no system library dependencies. Implementation in src/file_processor/mod.rs uses ocrs::OcrEngine with rten::Model via spawn_blocking. Models are auto-downloaded from S3 on first use to ~/.cache/ocrs/. Committed in 12e9337.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

docs.rs
- Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Copilot

Pull request overview

Adds Telegram media (photos + documents) support by downloading attachments to temp storage, extracting usable text (OCR/PDF/DOCX) and/or sending images as multimodal content to vision-capable models, with long-document fallback to the existing knowledge/vector store.

Changes:

Extend platform message model with typed attachments and add Telegram download + cleanup logic for photos/documents.
Promote LLM ChatMessage.content to a backwards-compatible multimodal representation (text or parts).
Add a file processing pipeline (vision base64, OCR via ocrs, PDF/DOCX extraction, long-context chunking + RAG retrieval) and wire it into the agent.

Reviewed changes

Copilot reviewed 17 out of 19 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
src/supervisor/backend/reasoning.rs	Populates new `attachments` field for supervisor-originated messages.
src/platform/telegram.rs	Downloads Telegram photos/documents into temp dirs, classifies attachments, cleans up after processing, adds unit tests.
src/platform/mod.rs	Introduces `AttachmentKind`, `Attachment`, and `IncomingMessage.attachments`.
src/memory/summarizer.rs	Adapts summarizer to new `MessageContent` and `as_text()` extraction.
src/memory/rag.rs	Updates RAG snippet building to use `MessageContent::as_text()`.
src/memory/query_rewriter.rs	Updates query rewriter to build prompts from `MessageContent::as_text()`.
src/memory/conversations.rs	Stores/retrieves text-only content while mapping DB rows to `MessageContent::Text`.
src/llm.rs	Adds multimodal content types (`MessageContent`, `ContentPart`) with backward-compatible serialization + tests.
src/lib.rs	Exposes new `file_processor` module.
src/learning.rs	Updates learning flows to use `MessageContent` consistently.
src/file_processor/mod.rs	New attachment processing module: vision/OCR, PDF/DOCX extraction, chunking + knowledge-store indexing/retrieval.
src/config.rs	Adds `openrouter.supports_vision` and `[ocr].model_dir` defaults + tests.
src/agent.rs	Integrates attachment processing into message construction and persistence behavior.
src/agent_prompt.rs	Adjusts prompt estimation/compaction logic for `MessageContent`.
docs/plans/2026-03-25-telegram-file-image-support.md	Adds an implementation plan documenting the intended architecture and dependencies.
config.example.toml	Documents new `supports_vision` and `[ocr]` settings.
Cargo.toml	Adds new crates for OCR/image/PDF/DOCX/mime/base64 support.
Cargo.lock	Locks new transitive dependencies for added crates.
.gitignore	Updates ignored DB filename pattern to `rustfox.db*`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    let dest_name = match filename {
+        Some(n) => n.to_string(),
+        None => format!("{}.{}", uuid::Uuid::new_v4(), ext),
+    };
+    let dest_path = dest_dir.join(&dest_name);


+    let ext = Path::new(&file.path)
+        .extension()
+        .and_then(|e| e.to_str())
+        .unwrap_or("bin");
+


+    dest_dir: &Path,
+    filename: Option<&str>,
+) -> Result<(PathBuf, String)> {
+    std::fs::create_dir_all(dest_dir).context("Failed to create temp directory")?;


+        .await
+        .context("Failed to download file from Telegram")?;
+
+    std::fs::write(&dest_path, &bytes).context("Failed to write downloaded file")?;


+    let response = reqwest::get(url)
+        .await
+        .context("Failed to fetch OCR model")?;
+    let bytes = response
+        .bytes()
+        .await
+        .context("Failed to read OCR model bytes")?;


+        // Save a text-only version to DB (avoid storing base64 image data in message history)
+        let db_content = if incoming.attachments.is_empty() {
+            user_msg_content.clone()
+        } else {
+            let mut db_text = incoming.text.clone();
+            if !attachment_text.is_empty() {
+                db_text.push_str("\n\n[Attachment processed]");
+            }
+            MessageContent::from_text(db_text)
+        };


+# Used to extract text from images when supports_vision = false.
+# Models are downloaded automatically on first use.
+# [ocr]
+# model_dir = "~/.cache/ocrs"  # Where OCR model files are cached (downloaded on first use)


+    tokio::task::spawn_blocking(move || -> Result<String> {
+        let detection_model =
+            rten::Model::load_file(&det_path).context("Failed to load OCR detection model")?;
+        let recognition_model =
+            rten::Model::load_file(&rec_path).context("Failed to load OCR recognition model")?;


+                match extract_pdf_text(&attachment.path) {
+                    Ok(text) => {
+                        let ctx = handle_context_length(&text, fname, user_query, memory).await;
+                        text_parts.push(ctx);
+                    }
+                    Err(e) => {
+                        tracing::warn!("PDF extraction failed: {}", e);
+                        text_parts.push(format!("[PDF processing failed: {}]", e));
+                    }
+                }


+                match extract_docx_text(&attachment.path) {
+                    Ok(text) => {
+                        let ctx = handle_context_length(&text, fname, user_query, memory).await;
+                        text_parts.push(ctx);
+                    }
+                    Err(e) => {
+                        tracing::warn!("DOCX extraction failed: {}", e);
+                        text_parts.push(format!("[DOCX processing failed: {}]", e));
+                    }
+                }


Initial plan

7f8a451

Copilot AI assigned Copilot and chinkan Mar 25, 2026

Copilot started work on behalf of chinkan March 25, 2026 10:16 View session

Copilot AI changed the title ~~[WIP] Add support for processing files and images from Telegram~~ feat: Add Telegram file/image support with vision, OCR, and document-to-context pipeline Mar 25, 2026

Copilot AI requested a review from chinkan March 25, 2026 10:21

Copilot finished work on behalf of chinkan March 25, 2026 10:21

Copilot started work on behalf of chinkan March 25, 2026 10:30 View session

Copilot AI and others added 12 commits March 25, 2026 10:39

docs: add implementation plan for telegram file/image support with oc…

73e65a2

…rs OCR Co-authored-by: chinkan <16433287+chinkan@users.noreply.github.com> Agent-Logs-Url: https://github.com/chinkan/RustFox/sessions/9c502a20-dd1e-47bb-8b4d-34cf2f26c7f1

feat: add file processing dependencies

cbb559c

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat: add Attachment type to IncomingMessage

24f5a41

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: use to_string_lossy() in ocr test for UTF-8 safety

3c969f9

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

refactor: avoid double chars().count() in handle_context_length; add …

1215e53

…comment on pdf unwrap_or_default Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: use tokio::fs::remove_dir_all for async temp dir cleanup

3fc8d16

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat: agent processes file attachments

1f1fbb2

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

test: add MessageContent serialization tests, fix clippy collapsible_…

12e9337

…match Co-authored-by: chinkan <16433287+chinkan@users.noreply.github.com> Agent-Logs-Url: https://github.com/chinkan/RustFox/sessions/9c502a20-dd1e-47bb-8b4d-34cf2f26c7f1

Copilot finished work on behalf of chinkan March 25, 2026 11:23

chinkan added 2 commits June 4, 2026 11:12

Merge branch 'main' into copilot/add-telegram-file-image-support

41d1d7b

refactor: update content handling in ChatMessage and related modules

11e26cd

chinkan requested a review from Copilot June 4, 2026 03:20

Copilot started reviewing on behalf of chinkan June 4, 2026 03:20 View session

Copilot AI reviewed Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Telegram file/image support with vision, OCR, and document-to-context pipeline#31

feat: Add Telegram file/image support with vision, OCR, and document-to-context pipeline#31
Copilot wants to merge 15 commits into
mainfrom
copilot/add-telegram-file-image-support

Copilot AI commented Mar 25, 2026 •

edited

Loading

Uh oh!

chinkan commented Mar 25, 2026

Uh oh!

Copilot AI commented Mar 25, 2026 •

edited

Loading

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Copilot AI commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Platform Layer

LLM Message Model (llm.rs)

File Processor (src/file_processor/)

OCR

Config

New Dependencies

Uh oh!

chinkan commented Mar 25, 2026

Uh oh!

Copilot AI commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 25, 2026 •

edited

Loading

LLM Message Model (`llm.rs`)

File Processor (`src/file_processor/`)

Copilot AI commented Mar 25, 2026 •

edited

Loading