Author: .elmoustache
Channel: #shelley
Link: https://discord.com/channels/1405685085923049482/1450334528210993295/1504387631038074951
Issue
When using models that don't support vision (e.g., DeepSeek), Shelley may insert images into the conversation, which makes the conversation unrecoverable. The agent should be aware of its own limitations and handle images more gracefully.
Expected Behavior
Instead of inserting images directly into conversations with vision-unsupported models, the agent should:
- Avoid including images in the conversation context
- Use tools to work with images (e.g., OCR, object detection) when needed
- Be aware of and respect its own capability limits
- Gracefully skip or handle images appropriately
Current Behavior
Images are inserted into conversations with non-vision models, causing the conversation to become unrecoverable.
Context
Discussion in replies suggested potential workarounds:
- Using a proxy that supports passing images to a vision model if the target model doesn't support it
- Using OCR and object detection as fallback approaches
Author: .elmoustache
Channel: #shelley
Link: https://discord.com/channels/1405685085923049482/1450334528210993295/1504387631038074951
Issue
When using models that don't support vision (e.g., DeepSeek), Shelley may insert images into the conversation, which makes the conversation unrecoverable. The agent should be aware of its own limitations and handle images more gracefully.
Expected Behavior
Instead of inserting images directly into conversations with vision-unsupported models, the agent should:
Current Behavior
Images are inserted into conversations with non-vision models, causing the conversation to become unrecoverable.
Context
Discussion in replies suggested potential workarounds: