Welcome to the Agentic Project! This document will help you understand our project structure, capabilities, and the technology choices we've made. By the end, you should have a good understanding of how everything fits together and be ready to contribute.
This is a Multi-Agent Personal Financial Portal that uses AI agent design patterns to help users track and analyze their purchases. The system can:
- Process receipt images using OCR (Optical Character Recognition)
- Extract structured data from receipts (merchant, items, prices, etc.)
- Store purchase history
- Analyze spending patterns and provide financial insights
- Answer natural language queries about purchase history
The codebase follows a modular architecture organized as follows:
agentic-project/
├── app.py # Main application entry point
├── data/ # Data storage directory
│ └── purchases.json # Purchase history database
├── src/ # Source code
│ ├── agents/ # Agent modules
│ │ ├── coordinator_agent.py # Orchestrates other agents
│ │ └── receipt_reader_agent.py # Processes receipt images
│ ├── tools/ # Tools used by agents
│ │ ├── memory_tools.py # Tools for accessing memory
│ │ ├── receipt_processor_tool.py # Tool for processing receipts
│ │ └── receipt_tools.py # Receipt OCR and parsing tools
│ └── utils/ # Utility modules
│ ├── image_utils.py # Image processing utilities
│ └── memory.py # Purchase memory storage
├── streamlit_app/ # Streamlit web interface
│ └── app.py # Streamlit application
└── tests/ # Test suite
├── test_data/ # Test images and fixtures
└── unit/ # Unit tests
We use a multi-agent architecture where each agent has specific responsibilities:
-
Coordinator Agent (
src/agents/coordinator_agent.py):- Acts as the main entry point for user interactions
- Orchestrates other specialized agents
- Handles natural language queries about purchases
- Delegates specialized tasks to other agents
- Uses OpenAI's GPT models for natural language understanding
-
Receipt Reader Agent (
src/agents/receipt_reader_agent.py):- Extracts structured data from receipt images
- Uses Mistral AI's OCR capabilities to read text from images
- Implements the Tool Use pattern for leveraging LLM capabilities
Tools are specialized components that agents use to perform specific tasks:
-
Receipt Tools (
src/tools/receipt_tools.py):MistralOCRTool: Extracts text from receipt images using Mistral's OCRReceiptParserTool: Converts raw text to structured receipt data
-
Memory Tools (
src/tools/memory_tools.py):MemoryTool: Provides access to purchase historyInsightGeneratorTool: Generates financial insights from purchase history
-
Receipt Processor Tool (
src/tools/receipt_processor_tool.py):- Acts as a bridge between the receipt reader agent and memory
- Processes receipts and stores results in purchase memory
The memory system (src/utils/memory.py) stores and manages purchase history:
- Implements the Memory Pattern for persistence
- Uses data classes (
PurchaseandPurchaseItem) for type safety - Provides filtering capabilities (by merchant, date, category)
- Integrates with LangChain's memory system for agent access
LangChain is our primary framework for building LLM-powered applications. We use it because:
-
Agent Architecture: LangChain provides robust agent frameworks that enable LLMs to use tools, make decisions, and execute multi-step tasks
AgentExecutor: Manages agent execution and tool usagecreate_openai_functions_agent: Creates OpenAI function-calling agentscreate_react_agent: Creates ReAct-style agents for reasoning and action
-
Memory Systems: LangChain offers memory components that help maintain context
ConversationBufferMemory: Stores conversation historySimpleMemoryandReadOnlySharedMemory: Let us share data between components
-
Prompt Management: LangChain's prompt templates make it easy to create consistent interactions with LLMs
ChatPromptTemplateandMessagesPlaceholder: Structure agent prompts- System messages and human messages: Create proper conversation context
-
Tool Integration: LangChain's tools framework makes it easy to extend agent capabilities
BaseTool: Base class for all our custom tools- Tool registration: Automatically makes tools available to agents
We use Mistral AI for:
- OCR Capabilities: Extract text from images through their OCR model
- LLM Models: Process extracted text and generate structured data
- Multi-modal Understanding: Process both text and images in a single API call
We use OpenAI models for:
- Natural Language Understanding: Process user queries about their finances
- Agent Orchestration: The coordinator agent uses OpenAI models for high-level reasoning
- Financial Insights: Generate useful financial insights from purchase history
Our system implements several agent design patterns:
-
Coordinator Pattern: The coordinator agent orchestrates specialized agents to solve complex tasks
-
Tool Use Pattern: Agents use specialized tools to extend their capabilities beyond just text generation
-
Memory Pattern: The system maintains persistent memory of purchase history and user interactions
-
ReAct Pattern: Agents follow a "Reasoning and Acting" cycle where they:
- Think about what to do next
- Choose an action (tool to use)
- Observe the result
- Plan the next step
When a user submits a receipt image:
- The coordinator agent receives the image path
- It delegates to the receipt reader agent
- The receipt reader uses OCR to extract text from the image
- The extracted text is parsed into structured data
- The structured purchase data is stored in memory
- The purchase data is returned to the user
When a user asks about their spending:
- The coordinator agent processes the natural language query
- It determines what information is needed
- It uses the memory tool to access relevant purchase data
- It formulates a helpful response based on the retrieved data
- For complex insights, it may use the insight generator tool
To set up your development environment:
- Install dependencies:
pip install -r requirements.txt - Set up environment variables:
MISTRAL_API_KEY: Your Mistral AI API keyOPENAI_API_KEY: Your OpenAI API key
- Run the application:
python app.py