✨ Feature Summary
Implement a multimodal parser that supports multiple modes (w/ & w/o LLM powered). With such parser, quantmind can easily convert the video / audio / images / text / file object to the ParserResult
# The mock parser result
class ParserResult(BaseModel):
text: str
images: Dict[str, bytes]
@bridgeqiqi I'd love for you to add the remaining parts! ✨ (feel free to remove useless parts)
🎯 Motivation
📋 Detailed Description
🔧 Proposed Implementation
API Design
# If applicable, show how you envision the API would look
Configuration
# If applicable, show any new configuration options
🎨 User Experience
📊 Use Cases
- Use Case 1:
- Use Case 2:
- Use Case 3:
🔗 Related Issues
- Relates to #
- Blocks #
- Depends on #
Implementation Considerations
Breaking Changes
Dependencies
Checklist
✨ Feature Summary
Implement a multimodal parser that supports multiple modes (w/ & w/o LLM powered). With such parser,
quantmindcan easily convert the video / audio / images / text / file object to theParserResult@bridgeqiqi I'd love for you to add the remaining parts! ✨ (feel free to remove useless parts)
🎯 Motivation
📋 Detailed Description
🔧 Proposed Implementation
API Design
# If applicable, show how you envision the API would lookConfiguration
# If applicable, show any new configuration options🎨 User Experience
📊 Use Cases
🔗 Related Issues
Implementation Considerations
Breaking Changes
Dependencies
Checklist