SmudgeAI

SmudgeAI is a Windows-native autonomous desktop AI agent that understands screen context through vision models and UI element trees, enabling it to navigate any application and execute complex multi-step workflows with minimal human intervention. It leverages LLM-based reasoning to adapt to unknown interfaces while maintaining security through built-in permission systems and safeguards.

Overview
Architecture
Core Components
AI Engine & Multimodal Processing
Desktop Automation Stack
UI Detection & Element Matching
Security & Safety Systems
Error Handling & Recovery
Multi-Monitor & DPI Scaling Support
Internationalization & Localization
Logging & Observability
Capabilities Summary
Comparison with OpenClaw
Bug Fixes & Security Hardening
Getting Started
Configuration
Roadmap

Overview

SmudgeAI is designed to be a fully autonomous desktop control agent that can:

Understand screen context through vision models and UI element trees
Navigate any Windows application using UIA-based element discovery
Execute multi-step workflows with verification and rollback
Adapt to unknown interfaces through LLM-based reasoning
Operate safely with permission systems and safeguards

Key Design Goals

Reliability First - Every action is verified before proceeding
Security by Default - Permission systems and input sanitization on all dangerous operations
Cross-UI Adaptability - Works with any Windows application through UIA and CV fallback
Production Ready - Structured logging, error recovery, and observability

Architecture

High-Level System Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                              SmudgeAI Architecture                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────────┐     ┌──────────────────────────────────────────────┐     │
│  │    GUI       │────▶│              AI Engine                        │     │
│  │  (PyQt5)     │     │  ┌─────────────┐  ┌─────────────────────┐   │     │
│  │              │     │  │ Groq Client │  │ Gemini Client        │   │     │
│  │  - Input     │     │  │ (Primary)   │  │ (Fallback)          │   │     │
│  │  - Display   │     │  └─────────────┘  └─────────────────────┘   │     │
│  │  - Status    │     │  ┌─────────────────────────────────────┐   │     │
│  └──────────────┘     │  │ Rate Limiter & Model Cycling          │   │     │
│         │              │  └─────────────────────────────────────┘   │     │
│         │              │  ┌─────────────────────────────────────┐   │     │
│         ▼              │  │ Conversation History Manager        │   │     │
│  ┌──────────────────────────────────────────────────────────────┐ │     │
│  │                    Task Manager                               │ │     │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────────────┐    │ │     │
│  │  │ Permission  │  │ Tool       │  │ Error              │    │ │     │
│  │  │ System      │  │ Registry   │  │ Classifier         │    │ │     │
│  │  └────────────┘  └────────────┘  └────────────────────┘    │ │     │
│  └──────────────────────────────────────────────────────────────┘ │     │
│                              │                                        │
│         ┌────────────────────┼────────────────────┐                 │
│         ▼                    ▼                    ▼                    │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────┐        │
│  │ Desktop     │    │ CV/UI       │    │ Local VLM       │        │
│  │ State       │    │ Integration │    │ (Optional)      │        │
│  │ (UIA)       │    │             │    │                 │        │
│  └─────────────┘    └─────────────┘    └─────────────────┘        │
│                              │                                        │
│                              ▼                                        │
│                     ┌─────────────────┐                              │
│                     │  Action         │                              │
│                     │  Execution      │                              │
│                     │  (pyautogui)    │                              │
│                     └─────────────────┘                              │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Component Responsibilities

Component	Responsibility	Language/Framework
GUI	User input, status display, permission dialogs	PyQt5
AI Engine	LLM orchestration, vision analysis, tool calling	Python (async)
Task Manager	Tool execution, permission checks, caching	Python
Desktop State	UIA element tree, window management	pywinauto + pygetwindow
CV/UI Integration	Template matching, LLM coordinate verification	OpenCV + PIL
Error Handler	Error classification, retry strategies	Python
Structured Logging	Correlation IDs, action tracking	Python (custom)

Core Components

1. AI Engine (`ai_engine.py`)

The AI Engine is the brain of SmudgeAI, orchestrating all LLM interactions.

Features

Multi-Provider Support: Groq (primary), Google Gemini (fallback)
Vision Analysis: Llama 3.2 Vision (Groq) → Gemini 1.5 Flash fallback
Model Cycling: Automatic failover when rate limits hit
Rate Limiting: Built-in rate limiter with exponential backoff
Conversation History: Properly serialized message history (dict-based)
Tool Schema Generation: Dynamic tool schema from Python functions

Rate Limiter Implementation

class RateLimiter:
    requests_in_window: int      # Requests in current window
    window_size: float = 60.0    # 60-second window
    max_requests: int = 30       # Max 30 requests per window
    blocked_until: float         # Timestamp when block expires
    consecutive_errors: int       # Track consecutive failures

Backoff Strategy:

Base delay: 30 seconds
Exponential: 2^consecutive_errors
Jitter: Random 0-10 seconds
Max block: 300 seconds (5 minutes)

Vision Pipeline

Screenshot → Groq Llama 3.2 Vision → JSON Elements → Coordinate Verification → Click
                ↓ (fallback)
         Gemini 1.5 Flash Vision

2. Desktop State (`desktop_state.py`)

Captures and maintains the UI element hierarchy of all windows.

COM Threading Fix

Previously, pywinauto initialization caused race conditions. Now:

def _ensure_pywinauto_com_init():
    import pythoncom
    pythoncom.CoInitializeEx(None, pythoncom.COINIT_MULTITHREADED)

Element Types Supported

WINDOW, BUTTON, EDIT, MENU, MENU_ITEM
TAB, CHECKBOX, RADIO_BUTTON, COMBOBOX
LIST, LIST_ITEM, TEXT, UNKNOWN

Data Structures

@dataclass
class UIElement:
    title: str
    element_type: ElementType
    rect: tuple          # (x, y, width, height)
    automation_id: str    # UIA AutomationId
    class_name: str       # Window class name
    is_visible: bool
    is_enabled: bool
    children: List[UIElement]

@dataclass
class WindowInfo:
    title: str
    process_name: str
    rect: tuple
    is_active: bool
    elements: List[UIElement]

3. CV/UI Integration (`cv_ui_integration.py`)

Hybrid UI detection combining UIA accessibility with computer vision.

Detection Methods (Priority Order)

UIA Detection: Uses Windows Accessibility API for precise element location
Template Matching: OpenCV-based image matching for visual elements
LLM Vision: Groq/Gemini vision for complex UI interpretation
Coordinate Verification: All LLM coordinates verified against UIA elements

ScreenHelper Class

Handles multi-monitor and DPI scaling:

class ScreenHelper:
    _dpi_scale: float              # System DPI (1.0 = 100%)
    _monitor_info: Dict            # Per-monitor bounds

    def adjust_coords_for_monitor(x, y, window_rect):
        # Offset coordinates for secondary monitors

    def scale_screenshot_for_dpi(screenshot_path):
        # Scale screenshot to match actual pixel coordinates

RobustClicker with Circuit Breaker

class RobustClicker:
    _max_retries: int = 3
    _circuit_breaker_max_iterations: int = 10
    _circuit_breaker_time_budget: float = 30.0  # seconds

    # Exponential backoff: 0.5s → 1.0s → 2.0s (capped at 5s)
    def _get_retry_delay(attempt) -> float

4. Task Manager (`task_manager.py`)

Executes tools/actions requested by the AI engine.

Permission System

DANGEROUS_ACTIONS_REGEX patterns detect:

File deletion: delete, remove
System commands: shutdown, restart, kill
Registry modifications: reg delete, reg add
Shell execution: exec, cmd, powershell

class PermissionSystem:
    DANGEROUS_PATTERNS = [
        (r"delete", r"(file|folder|directory)", "delete_file", ...),
        (r"shutdown", r".*", "shutdown_pc", ...),
        # ... 17 total patterns
    ]

    SYSTEM_DIRS = [
        r"C:\Windows", r"C:\Program Files",
        r"C:\System32", r"C:\Boot", r"C:\Recovery"
    ]

Command Cache

Expiry: 24 hours
Max Size: 50 entries
Validation: Schema validation, sensitive data redaction

5. Error Handler (`error_handler.py`)

Structured error classification and recovery.

Error Categories

Category	Description	Action
RETRYABLE	Transient failures (timeout, network)	Retry with backoff
FATAL	Unrecoverable errors	Log and escalate
ESCALATION	Requires human intervention	Block and alert

Retry Strategies

class RetryStrategy:
    max_retries: int = 3
    base_delay: float = 0.5
    max_delay: float = 10.0
    exponential_base: float = 2.0
    jitter: bool = True

AI Engine & Multimodal Processing

Vision Pipeline

User Request: "Click the Save button"

┌─────────────────────────────────────────────────────────────┐
│ 1. Screenshot Capture                                        │
│    - pyautogui.screenshot()                                 │
│    - Scale for DPI if needed                                │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│ 2. Element Detection                                         │
│    a) UIA Lookup (if window context available)              │
│    b) Template Matching (OpenCV)                            │
│    c) LLM Vision Analysis                                   │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│ 3. Coordinate Verification                                  │
│    - Check if LLM coords fall within any UIA element        │
│    - Penalize confidence if unverified                     │
│    - Filter elements below 0.3 confidence threshold         │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│ 4. Action Execution                                         │
│    - pyautogui.click(x, y)                                 │
│    - Adjust for monitor offset/DPI                           │
│    - Verify state change post-click                          │
└─────────────────────────────────────────────────────────────┘

Model Selection Strategy

Text + Tools: Groq llama-3.3-70b-versatile (primary)
Vision: Groq llama-3.2-11b-vision-preview (primary)
Fallback: Gemini 1.5 Flash for both

Conversation History Management

All messages stored as plain dictionaries to prevent type confusion:

def _message_to_dict(msg) -> dict:
    return {
        "role": getattr(msg, "role", "assistant"),
        "content": getattr(msg, "content", None),
        "tool_calls": getattr(msg, "tool_calls", None),
        "tool_call_id": getattr(msg, "tool_call_id", None)
    }

Desktop Automation Stack

Action Execution Flow

AI Decision → Tool Call → Permission Check → Action → Verification → State Update

Permission Check:
├─ Dangerous pattern detected?
├─ System directory access?
├─ Path traversal attempt?
└─ Callback to user if needed

Supported Actions

Category	Actions
Application	open_application, close_application, switch_to_window
Navigation	click_element, double_click, right_click, hover
Text Input	type_text, press_key, hotkey, select_all
File System	create_file, read_file, delete_file, list_directory
Clipboard	read_clipboard, write_clipboard
System	shutdown_pc, restart_pc, get_system_info
Web	search_google, open_url, get_page_text
Office	send_email, create_document

State Verification

Every click action is verified by comparing before/after state:

async def _execute_click_with_verification(element):
    pre_state = desktop_state.get_state_summary()
    pyautogui.click(x, y)
    await asyncio.sleep(0.2)
    post_state = desktop_state.get_state_summary()
    return pre_state != post_state or _verify_element_exists(element)

UI Detection & Element Matching

Detection Method Priority

UIA (Windows Accessibility): Fastest, most reliable for standard Windows apps
Template Matching (OpenCV): Good for custom/non-standard UI
LLM Vision: Last resort for complex or dynamic UIs

Localization Support

UI element matching supports localized button text:

_LOCALIZED_UI_TERMS = {
    "de_DE": {"save": ["Speichern"], "open": ["Öffnen"], ...},
    "fr_FR": {"save": ["Enregistrer"], "open": ["Ouvrir"], ...},
    "ja_JP": {"save": ["保存"], "open": ["開く"], ...},
    # ... 6 languages supported
}

Hallucination Prevention

LLM-provided coordinates are verified against UIA elements:

def _verify_llm_coordinates(elements):
    for elem in elements:
        if _is_coord_in_any_element_bounds(elem):
            elem.confidence += 0.3   # Boost verified coords
        else:
            elem.confidence -= 0.4   # Penalize hallucinated
    return [e for e in elements if e.confidence > 0.3]

Security & Safety Systems

1. Permission System

All dangerous operations require explicit permission:

DANGEROUS_PATTERNS = [
    (r"delete", r"(file|folder)", "delete_file"),
    (r"shutdown", r".*", "shutdown_pc"),
    (r"exec", r"(shell|cmd|powershell)", "execute_shell"),
    # ... 17 patterns total
]

Built-in Safety:

SAFE_MODE flag for human-in-the-loop
System directory protection
Path traversal detection
Confirmation dialogs for destructive actions

2. Input Sanitization

Clipboard Injection Prevention:

PROMPT_INJECTION_PATTERNS = [
    r"(?i)ignore\s+(previous|all|your)\s+instructions",
    r"(?i)jailbreak",
    r"(?i) DAN\s+mode",
    # ... 15+ patterns
]

SKILL.md Template Injection Prevention:

def _sanitize_skill_content(content):
    # Remove {{...}}, {%...%}, <script>, javascript:
    sanitized = re.sub(r'\{\{[^}]+\}\}', '[TEMPLATE_REMOVED]')
    sanitized = re.sub(r'(?i)<script[^>]*>.*?</script>', '[SCRIPT_REMOVED]')
    return sanitized

3. Path Traversal Protection

def create_file(path):
    real_path = os.path.realpath(path)
    if any('..' in part for part in path.split(os.sep)):
        return "Error: Path traversal detected"
    if is_system_directory(real_path):
        return "Error: Access denied to system directory"

4. Cache Validation

def _validate_cache_entry(entry):
    if not isinstance(entry, dict):
        return False
    for k in ["tool", "args", "result", "timestamp"]:
        if k not in entry:
            return False
    # Redact sensitive keys
    for k in SENSITIVE_CACHE_KEYS:
        if k in entry:
            entry[k] = "***REDACTED***"
    return True

5. API Key Security

ALLOWED_API_KEY_PREFIXES = {
    "GROQ_API_KEY": ["gsk_"],
    "GOOGLE_API_KEY": ["AIza"],
}

def _validate_api_key(key_name, value):
    if not any(value.startswith(p) for p in ALLOWED_PREFIXES[key_name]):
        return False  # Invalid prefix = potential fake key

Error Handling & Recovery

Error Classification

class ErrorClassifier:
    RETRY_PATTERNS = {
        "timeout": {
            "patterns": ["timeout", "timed out"],
            "severity": ErrorSeverity.MEDIUM,
            "suggested_fix": "Retry with exponential backoff"
        },
        "rate_limit": {
            "patterns": ["rate limit", "429"],
            "severity": ErrorSeverity.HIGH,
            "suggested_fix": "Wait and retry with longer backoff"
        },
        "connection": {
            "patterns": ["connection", "network"],
            "severity": ErrorSeverity.MEDIUM,
            "suggested_fix": "Check network, retry after delay"
        }
    }

Circuit Breaker Pattern

class RobustClicker:
    _circuit_breaker_max_iterations = 10
    _circuit_breaker_time_budget = 30.0  # seconds

    def _check_circuit_breaker(self):
        if self._circuit_breaker_iterations >= MAX:
            return False  # Block further attempts
        if time.time() - self._start_time > BUDGET:
            return False  # Time exceeded
        return True

Multi-Monitor & DPI Scaling Support

Multi-Monitor Handling

class ScreenHelper:
    def get_primary_monitor_offset(self) -> Tuple[int, int]:
        # Returns (x, y) offset of primary monitor

    def adjust_coords_for_monitor(self, x, y, window_rect):
        # Adjust for windows on secondary monitors
        # pyautogui uses primary monitor coords
        # Secondary monitors may have negative coords

DPI Scaling Detection

def _init_dpi(self):
    user32 = ctypes.windll.user32
    user32.SetProcessDPIAware()
    self._dpi_scale = user32.GetDpiForSystem() / 96.0

def scale_screenshot_for_dpi(self, screenshot_path):
    if self._dpi_scale != 1.0:
        img = Image.open(screenshot_path)
        new_size = (int(img.width * self._dpi_scale), ...)
        img_scaled = img.resize(new_size, Image.LANCZOS)
        img_scaled.save(screenshot_path)

Internationalization & Localization

Supported Locales

Locale	Language	Coverage
en_US	English	Default (no mapping needed)
de_DE	German	24 common UI terms
fr_FR	French	24 common UI terms
es_ES	Spanish	24 common UI terms
ja_JP	Japanese	24 common UI terms
zh_CN	Chinese (Simplified)	24 common UI terms
ko_KR	Korean	24 common UI terms

Element Matching with Localization

def _match_element(elements, description):
    search_terms = set(description.lower().split())

    # Expand with localized terms
    for word in search_terms:
        localized = get_localized_terms(word)  # ["Speichern", "Save"]
        search_terms.update(localized)

    # Match against element labels
    for elem in elements:
        if _fuzzy_match(search_terms, elem.label):
            return elem

Logging & Observability

Structured Logging with Correlation IDs

class CorrelationLogger:
    session_id: str        # Unique per application run
    correlation_id: str    # Unique per action/request

    def log_action(action, target, result, duration_ms):
        # Format: [SESSION_ID][CORR_ID][INFO] ACTION: action | target=x result=y duration_ms=z

    def log_api_call(provider, model, success, duration_ms, error):
        # Track API performance and failures

    def log_tool_execution(tool_name, args, success, duration_ms):
        # Monitor tool performance

Log Context Manager

with LogContext(user_id="123", action="click"):
    # All logs within this block include user_id=123 and action=click
    click_element("Save")

Capabilities Summary

What SmudgeAI Can Do

Capability	Implementation	Status
Open/close applications	`subprocess` + pyautogui	✅
Click UI elements	UIA + pyautogui	✅
Type text	pyautogui	✅
Read clipboard	pyperclip (sanitized)	✅
Navigate file system	os/shutil modules	✅
Web search	Google search API	✅
Send emails	SMTP	✅
Handle multi-monitor	ScreenHelper class	✅
Handle DPI scaling	ctypes DPI detection	✅
Localized UI matching	7 languages	✅
Permission system	17 dangerous patterns	✅
Error recovery	Circuit breaker + retry	✅
Vision analysis	Groq Vision + Gemini	✅
Coordinate verification	UIA validation	✅
Prompt injection detection	Regex patterns	✅
Path traversal prevention	realpath + validation	✅
Command cache	Validated + redacted	✅

What SmudgeAI Cannot Do (Yet)

Capability	Limitation	Priority
Browser automation	No Playwright/CDP	High
Cross-platform	Windows only	Medium
Drag-and-drop verification	Basic implementation	Medium
Voice input	Basic STT support	Low
Memory persistence	Session only	Medium

Comparison with OpenClaw

Aspect	SmudgeAI	OpenClaw
UI Understanding	UIA + CV + Vision (hybrid)	CDP + ARIA snapshots
Action Execution	pyautogui (coordinate-based)	Playwright (ref-based)
Verification	Before/after state diff	Explicit state contracts
Cross-platform	Windows only	macOS/Linux/Windows
Planning	LLM-based	Attempt-based with compaction
Rate Limiting	Built-in circuit breaker	API-level
Security	Permission system + injection prevention	Not evaluated
Localization	7 languages	Not evaluated

Where SmudgeAI Excels

Hybrid Detection: Combines UIA, CV, and Vision for robustness
Security: Comprehensive permission and injection prevention
Localization: Native multi-language UI matching
Error Recovery: Circuit breaker and retry strategies built-in

Where SmudgeAI Can Improve

Browser Automation: OpenClaw's CDP-based approach is more reliable
Action Verification: OpenClaw's state contracts are more explicit
Cross-platform: SmudgeAI is Windows-only

Bug Fixes & Security Hardening

Critical Bugs Fixed

Bug	Root Cause	Fix
COM Threading Conflict	pywinauto initialized before COM	Explicit `CoInitializeEx` at usage time
Hallucinated Coordinates	LLM provided unverified coords	UIA verification + confidence adjustment
History Type Mixing	Object/dict type inconsistency	Always serialize to dict
Infinite Loops	No circuit breaker	Max iterations + time budget
Silent Exception Crashes	No exception handler in Worker	Signal-based error propagation

Security Vulnerabilities Fixed

Vulnerability	Attack Vector	Mitigation
Prompt Injection	Malicious clipboard content	Regex pattern detection + redaction
Template Injection	Malicious SKILL.md	Template tag sanitization
API Key Exposure	.env commit	Key prefix validation + masked logging
Path Traversal	`../` in file paths	realpath validation + forbidden dirs
Unsafe Deserialization	Malicious cache JSON	Schema validation + redaction

Getting Started

Prerequisites

Windows 10/11
Python 3.9+
API Keys (Groq and/or Google Gemini)

Installation

# Clone the repository
git clone https://github.com/your-repo/SmudgeAI.git
cd SmudgeAI

# Install dependencies
pip install -r requirements.txt

# Create .env file
echo "GROQ_API_KEY=gsk_xxxxx" > .env
echo "GOOGLE_API_KEY=xxxxx" >> .env

# Run the application
python main.py

Requirements

pyqt5>=5.15.0
pywinauto>=0.6.8
pyautogui>=0.9.54
opencv-python>=4.8.0
Pillow>=10.0.0
groq>=0.4.0
google-generative-ai>=0.3.0
python-dotenv>=1.0.0
pyperclip>=1.8.2
pygetwindow>=0.0.9

Configuration

Environment Variables

# AI Provider
AI_PROVIDER=groq  # or "gemini"

# API Keys
GROQ_API_KEY=gsk_xxxxx
GOOGLE_API_KEY=AIzaxxxxx

# Search APIs (Optional)
TAVILY_API_KEY=tvly-xxxxx
SERPER_API_KEY=serper_xxxxx

# Email (Optional)
EMAIL_USER=your_email@gmail.com
EMAIL_PASSWORD=app_specific_password

# Logging
LOG_LEVEL=INFO  # DEBUG, INFO, WARNING, ERROR
LOG_FILE=smudgeai.log

# Safety
SAFE_MODE=True  # Require confirmation for dangerous actions

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
skill		skill
tests		tests
workflow_data		workflow_data
workspace		workspace
ARCHITECTURE.md		ARCHITECTURE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
README.md		README.md
ai_engine.py		ai_engine.py
ai_wave.gif		ai_wave.gif
app_control.py		app_control.py
command_cache.json		command_cache.json
config.py		config.py
cv_ui_integration.py		cv_ui_integration.py
desktop_state.py		desktop_state.py
enhanced_memory.py		enhanced_memory.py
enhanced_system.py		enhanced_system.py
error_handler.py		error_handler.py
example.file		example.file
gui.py		gui.py
jarvis.log		jarvis.log
local_vlm.py		local_vlm.py
main.py		main.py
monitoring.py		monitoring.py
multi_agent.py		multi_agent.py
requirements.txt		requirements.txt
skill_manager.py		skill_manager.py
speech_engine.py		speech_engine.py
structured_logging.py		structured_logging.py
system_control.py		system_control.py
task_manager.py		task_manager.py
task_planner.py		task_planner.py
test_ghost_detection.py		test_ghost_detection.py
vector_memory.py		vector_memory.py
web_automation.py		web_automation.py
workflow_learner.py		workflow_learner.py

Folders and files

Latest commit

History

Repository files navigation

SmudgeAI

Table of Contents

Overview

Key Design Goals

Architecture

High-Level System Diagram

Component Responsibilities

Core Components

1. AI Engine (ai_engine.py)

Features

Rate Limiter Implementation

Vision Pipeline

2. Desktop State (desktop_state.py)

COM Threading Fix

Element Types Supported

Data Structures

3. CV/UI Integration (cv_ui_integration.py)

Detection Methods (Priority Order)

ScreenHelper Class

RobustClicker with Circuit Breaker

4. Task Manager (task_manager.py)

Permission System

Command Cache

5. Error Handler (error_handler.py)

Error Categories

Retry Strategies

AI Engine & Multimodal Processing

Vision Pipeline

Model Selection Strategy

Conversation History Management

Desktop Automation Stack

Action Execution Flow

Supported Actions

State Verification

UI Detection & Element Matching

Detection Method Priority

Localization Support

Hallucination Prevention

Security & Safety Systems

1. Permission System

2. Input Sanitization

3. Path Traversal Protection

4. Cache Validation

5. API Key Security

Error Handling & Recovery

Error Classification

Circuit Breaker Pattern

Multi-Monitor & DPI Scaling Support

Multi-Monitor Handling

DPI Scaling Detection

Internationalization & Localization

Supported Locales

Element Matching with Localization

Logging & Observability

Structured Logging with Correlation IDs

Log Context Manager

Capabilities Summary

What SmudgeAI Can Do

What SmudgeAI Cannot Do (Yet)

Comparison with OpenClaw

Where SmudgeAI Excels

Where SmudgeAI Can Improve

Bug Fixes & Security Hardening

Critical Bugs Fixed

Security Vulnerabilities Fixed

Getting Started

Prerequisites

Installation

Requirements

Configuration

Environment Variables

Roadmap

Phase 1: Foundation (Completed ✅)

Phase 2: Reliability (Completed ✅)

Phase 3: Security (Completed ✅)

Phase 4: Autonomy (In Progress)

1. AI Engine (`ai_engine.py`)

2. Desktop State (`desktop_state.py`)

3. CV/UI Integration (`cv_ui_integration.py`)

4. Task Manager (`task_manager.py`)

5. Error Handler (`error_handler.py`)

Packages