Agentic Project Onboarding Guide

Welcome to the Agentic Project! This document will help you understand our project structure, capabilities, and the technology choices we've made. By the end, you should have a good understanding of how everything fits together and be ready to contribute.

Project Overview

This is a Multi-Agent Personal Financial Portal that uses AI agent design patterns to help users track and analyze their purchases. The system can:

Process receipt images using OCR (Optical Character Recognition)
Extract structured data from receipts (merchant, items, prices, etc.)
Store purchase history
Analyze spending patterns and provide financial insights
Answer natural language queries about purchase history

Project Structure

The codebase follows a modular architecture organized as follows:

agentic-project/
├── app.py                 # Main application entry point
├── data/                  # Data storage directory
│   └── purchases.json     # Purchase history database
├── src/                   # Source code
│   ├── agents/            # Agent modules
│   │   ├── coordinator_agent.py    # Orchestrates other agents
│   │   └── receipt_reader_agent.py  # Processes receipt images
│   ├── tools/             # Tools used by agents
│   │   ├── memory_tools.py          # Tools for accessing memory
│   │   ├── receipt_processor_tool.py # Tool for processing receipts
│   │   └── receipt_tools.py         # Receipt OCR and parsing tools
│   └── utils/             # Utility modules
│       ├── image_utils.py            # Image processing utilities
│       └── memory.py                 # Purchase memory storage
├── streamlit_app/         # Streamlit web interface
│   └── app.py             # Streamlit application
└── tests/                 # Test suite
    ├── test_data/         # Test images and fixtures
    └── unit/              # Unit tests

Core Components

Agents

We use a multi-agent architecture where each agent has specific responsibilities:

Coordinator Agent (src/agents/coordinator_agent.py):
- Acts as the main entry point for user interactions
- Orchestrates other specialized agents
- Handles natural language queries about purchases
- Delegates specialized tasks to other agents
- Uses OpenAI's GPT models for natural language understanding
Receipt Reader Agent (src/agents/receipt_reader_agent.py):
- Extracts structured data from receipt images
- Uses Mistral AI's OCR capabilities to read text from images
- Implements the Tool Use pattern for leveraging LLM capabilities

Tools

Tools are specialized components that agents use to perform specific tasks:

Receipt Tools (src/tools/receipt_tools.py):
- MistralOCRTool: Extracts text from receipt images using Mistral's OCR
- ReceiptParserTool: Converts raw text to structured receipt data
Memory Tools (src/tools/memory_tools.py):
- MemoryTool: Provides access to purchase history
- InsightGeneratorTool: Generates financial insights from purchase history
Receipt Processor Tool (src/tools/receipt_processor_tool.py):
- Acts as a bridge between the receipt reader agent and memory
- Processes receipts and stores results in purchase memory

Memory

The memory system (src/utils/memory.py) stores and manages purchase history:

Implements the Memory Pattern for persistence
Uses data classes (Purchase and PurchaseItem) for type safety
Provides filtering capabilities (by merchant, date, category)
Integrates with LangChain's memory system for agent access

Key Libraries and Why We Use Them

LangChain

LangChain is our primary framework for building LLM-powered applications. We use it because:

Agent Architecture: LangChain provides robust agent frameworks that enable LLMs to use tools, make decisions, and execute multi-step tasks
- AgentExecutor: Manages agent execution and tool usage
- create_openai_functions_agent: Creates OpenAI function-calling agents
- create_react_agent: Creates ReAct-style agents for reasoning and action
Memory Systems: LangChain offers memory components that help maintain context
- ConversationBufferMemory: Stores conversation history
- SimpleMemory and ReadOnlySharedMemory: Let us share data between components
Prompt Management: LangChain's prompt templates make it easy to create consistent interactions with LLMs
- ChatPromptTemplate and MessagesPlaceholder: Structure agent prompts
- System messages and human messages: Create proper conversation context
Tool Integration: LangChain's tools framework makes it easy to extend agent capabilities
- BaseTool: Base class for all our custom tools
- Tool registration: Automatically makes tools available to agents

Mistral AI

We use Mistral AI for:

OCR Capabilities: Extract text from images through their OCR model
LLM Models: Process extracted text and generate structured data
Multi-modal Understanding: Process both text and images in a single API call

OpenAI

We use OpenAI models for:

Natural Language Understanding: Process user queries about their finances
Agent Orchestration: The coordinator agent uses OpenAI models for high-level reasoning
Financial Insights: Generate useful financial insights from purchase history

Agent Design Patterns

Our system implements several agent design patterns:

Coordinator Pattern: The coordinator agent orchestrates specialized agents to solve complex tasks
Tool Use Pattern: Agents use specialized tools to extend their capabilities beyond just text generation
Memory Pattern: The system maintains persistent memory of purchase history and user interactions
ReAct Pattern: Agents follow a "Reasoning and Acting" cycle where they:
- Think about what to do next
- Choose an action (tool to use)
- Observe the result
- Plan the next step

Workflow Examples

Processing a Receipt

When a user submits a receipt image:

The coordinator agent receives the image path
It delegates to the receipt reader agent
The receipt reader uses OCR to extract text from the image
The extracted text is parsed into structured data
The structured purchase data is stored in memory
The purchase data is returned to the user

Answering User Queries

When a user asks about their spending:

The coordinator agent processes the natural language query
It determines what information is needed
It uses the memory tool to access relevant purchase data
It formulates a helpful response based on the retrieved data
For complex insights, it may use the insight generator tool

Getting Started

To set up your development environment:

Install dependencies: pip install -r requirements.txt
Set up environment variables:
- MISTRAL_API_KEY: Your Mistral AI API key
- OPENAI_API_KEY: Your OpenAI API key
Run the application: python app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agentic Project Onboarding Guide

Project Overview

Project Structure

Core Components

Agents

Tools

Memory

Key Libraries and Why We Use Them

LangChain

Mistral AI

OpenAI

Agent Design Patterns

Workflow Examples

Processing a Receipt

Answering User Queries

Getting Started

FilesExpand file tree

onboarding.md

Latest commit

History

onboarding.md

File metadata and controls

Agentic Project Onboarding Guide

Project Overview

Project Structure

Core Components

Agents

Tools

Memory

Key Libraries and Why We Use Them

LangChain

Mistral AI

OpenAI

Agent Design Patterns

Workflow Examples

Processing a Receipt

Answering User Queries

Getting Started