Skip to content

Handle images gracefully for vision-unsupported models #209

@exedev-shelley

Description

@exedev-shelley

Author: .elmoustache
Channel: #shelley
Link: https://discord.com/channels/1405685085923049482/1450334528210993295/1504387631038074951


Issue

When using models that don't support vision (e.g., DeepSeek), Shelley may insert images into the conversation, which makes the conversation unrecoverable. The agent should be aware of its own limitations and handle images more gracefully.

Expected Behavior

Instead of inserting images directly into conversations with vision-unsupported models, the agent should:

  1. Avoid including images in the conversation context
  2. Use tools to work with images (e.g., OCR, object detection) when needed
  3. Be aware of and respect its own capability limits
  4. Gracefully skip or handle images appropriately

Current Behavior

Images are inserted into conversations with non-vision models, causing the conversation to become unrecoverable.

Context

Discussion in replies suggested potential workarounds:

  • Using a proxy that supports passing images to a vision model if the target model doesn't support it
  • Using OCR and object detection as fallback approaches

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions