simwai · simwai · Mar 15, 2026 · Mar 15, 2026 · Mar 15, 2026 · Mar 15, 2026
diff --git a/.env.example b/.env.example
@@ -2,19 +2,32 @@
 AUTH_STORAGE_PATH=.storage/auth.json
 
 # Scraping behavior
-WAIT_MODE=fixed
-RATE_LIMIT_MS=3000
-PARALLEL_WORKERS=2
+# DISCOVERY_MODE: api (fast), scroll (stealth), interaction (direct), ai (smart)
+DISCOVERY_MODE=api
+# EXTRACTION_MODE: api (fast), dom (classic), native (interaction-export), ai (smart-dom)
+EXTRACTION_MODE=api
+WAIT_MODE=dynamic
+RATE_LIMIT_MS=1000
+PARALLEL_WORKERS=5
 CHECKPOINT_SAVE_INTERVAL=10
 
 # Vector search
 ENABLE_VECTOR_SEARCH=true
 
 # AI services
-GEMINI_API_KEY=
+# LLM_SOURCE: 'ollama' or 'openrouter'
+LLM_SOURCE=ollama
+# LLM_RAG_MODEL: Model for text reasoning and RAG
+LLM_RAG_MODEL=deepseek-r1:7b
+# LLM_VISION_MODEL: Model for vision tasks and captcha bypass
+LLM_VISION_MODEL=qwen3.5:4b
+LLM_EMBED_MODEL=nomic-embed-text
+
+# Ollama Specific
 OLLAMA_URL=http://localhost:11435
-OLLAMA_MODEL=deepseek-r1
-OLLAMA_EMBED_MODEL=nomic-embed-text
+
+# OpenRouter Specific
+OPENROUTER_API_KEY=
 
 # Paths
 EXPORT_DIR=exports
@@ -23,4 +36,4 @@ VECTOR_INDEX_PATH=.storage/vector-index
 
 # Browser behavior
 # HEADLESS can be 'true', 'false', or 'new'
-HEADLESS=true
+HEADLESS=false
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,89 +1,72 @@
-# Contributing to the Evolution of Perplexity History Export
+# Contributing to Perplexity History Export
 
-Welcome, seeker of organized intelligence. We are delighted that you've chosen to contribute your cognitive energy to this system. By refining this tool, we collectively enhance our ability to synthesize knowledge from our digital interactions.
+We welcome contributions! To ensure a smooth development process and maintain high code quality, please follow these guidelines.
 
-This project is a manifestation of structured data extraction and semantic synthesis. To maintain the integrity of its cognitive architecture, we follow a specific workflow.
+## Development Environment Setup
 
----
-
-## Prerequisites for Co-Creation
-
-To effectively interact with the codebase, your local environment must support the following substrates:
-
-- **Node.js 20+**: The fundamental runtime for our operations.
-- **Ollama**: Essential for local embedding generation and RAG-based reasoning.
+1. **Install Node.js**: Ensure you have Node.js 20+ installed.
+2. **Install Ollama**:
+  - Download and install [Ollama](https://ollama.ai/).
   - `ollama pull nomic-embed-text` (for semantic vectors)
-  - `ollama pull deepseek-r1` (for generative synthesis)
-- **Playwright**: Our interface for navigating the complexities of the web.
-
----
-
-## The Developmental Lifecycle
-
-### 1. Initialization
-
-Clone the repository and instantiate the dependencies:
-
-```bash
-npm install
-npx playwright install chromium
-```
-
-### 2. Environment Configuration
-
-Establish your local parameters:
-
-```bash
-cp .env.example .env
-# Refine the variables to align with your local Ollama setup.
-```
-
-### 3. Iterative Development
-
-Launch the interactive environment to observe the system in action:
-
-```bash
-npm run dev
-```
-
-### 4. Integrity Verification (Testing)
-
-We adhere to a "Testing Trophy" philosophy, prioritizing integration tests that verify the emergent behavior of system components.
-
-- **Unit Tests**: `npm run test:unit`
-- **Integration Tests**: `npm run test:integration` (Uses MSW to simulate Ollama interactions)
-- **End-to-End**: `npm run test:e2e`
-
-Always ensure the full suite passes before proposing a merger:
-
-```bash
-npm run test
-```
-
-### 5. Syntactic Harmony (Formatting)
-
-We utilize `oxlint` and `oxfmt` for rapid, high-performance code analysis and formatting. Maintain the aesthetic and structural consistency of the codebase:
+  - `ollama pull deepseek-r1:7b` (for generative synthesis)
+  - `ollama pull qwen3.5:4b` (for vision-based bypass)
+3. **Install Dependencies**:
+  ```bash
+  npm install
+  ```
+4. **Prepare Environment Variables**:
+  ```bash
+  cp .env.example .env
+  ```
+5. **Install Playwright Browsers**:
+  ```bash
+  npx playwright install chromium
+  ```
+
+## Development Workflow
+
+- **Start in Dev Mode**:
+  ```bash
+  # start dev
+  npm run dev
+  ```
+- **Type Checking**:
+  ```bash
+  npm run type-check
+  ```
+- **Formatting & Linting**:
+  ```bash
+  npm run format
+  ```
+
+## Commit Guidelines
+
+We use [Conventional Commits](https://www.conventionalcommits.org/).
+
+- `feat:` for new features.
+- `fix:` for bug fixes.
+- `docs:` for documentation changes.
+- `chore:` for maintenance tasks.
+
+## Testing Strategy
+
+- **Unit Tests**: Place in `test/unit/`.
+- **Integration Tests**: Place in `test/integration/`.
+- **Run all tests**:
+  ```bash
+  npm test
+  ```
+
+## Pull Request Process
+
+1. Create a feature branch.
+2. Ensure all tests pass.
+3. Submit the PR with a clear description of the changes.
+
+## Build Single Executable (SEA)
+
+To build the standalone executable for your platform:
 
 ```bash
-npm run format
+npm run build:exe
 ```
-
----
-
-## Proposing Cognitive Enhancements (PR Process)
-
-1. **Fork and Branch**: Create a branch with a descriptive prefix:
-   - `feat/` for novel capabilities.
-   - `fix/` for rectifying systemic discrepancies (bugs).
-   - `docs/` for enhancing the conceptual clarity of our documentation.
-2. **Commit with Intent**: Write clear, descriptive commit messages.
-3. **Synergize**: Open a Pull Request. Provide a concise summary of the changes and how they contribute to the system's overall utility.
-
----
-
-## Ethical and Intellectual Standards
-
-- **Clarity over Complexity**: While our goals are ambitious, our code should remain a model of lucidity.
-- **Robustness**: Build for resilience against the unpredictable nature of web interfaces and AI model outputs.
-
-Together, we are building a more coherent interface between human inquiry and machine intelligence.
diff --git a/README.md b/README.md
@@ -6,7 +6,7 @@
   <img src="https://img.shields.io/badge/Node.js-4c1d95?style=flat&logo=node.js&logoColor=white" alt="Node.js" />
   <img src="https://img.shields.io/badge/TypeScript-5b21b6?style=flat&logo=typescript&logoColor=white" alt="TypeScript" />
   <img src="https://img.shields.io/badge/Ollama-6d28d9?style=flat&logo=ollama&logoColor=white" alt="Ollama" />
-  <img src="https://img.shields.io/badge/Playwright-7c3aed?style=flat&logo=playwright&logoColor=white" alt="Playwright" />
+  <img src="https://img.shields.io/badge/Patchright-7c3aed?style=flat&logo=playwright&logoColor=white" alt="Patchright" />
   <img src="https://img.shields.io/badge/Vitest-8b5cf6?style=flat&logo=vitest&logoColor=white" alt="Vitest" />
 </p>
 
@@ -16,6 +16,7 @@
 
 - [Introduction](#introduction)
 - [Key Features](#key-features)
+- [Stealth & Behavioral Resilience](#stealth--behavioral-resilience)
 - [Environment Setup Guide](#environment-setup-guide)
   * [1. Install Node.js (The Engine)](#1-install-nodejs-the-engine)
   * [2. Install Ollama (The AI Intelligence)](#2-install-ollama-the-ai-intelligence)
@@ -39,13 +40,22 @@ This tool is designed to externalize your Perplexity.ai conversation history int
 
 ## Key Features
 
-- **Parallelized Extraction**: Leverages Playwright to extract multiple conversation threads simultaneously for high-velocity data retrieval.
+- **Parallelized Extraction**: Leverages worker pools to extract multiple conversation threads simultaneously for high-velocity data retrieval.
 - **Architectural Resilience**: Automatically restores browser contexts and retries operations, ensuring continuity amidst environmental instability.
 - **Advanced RAG (Retrieval-Augmented Generation)**: Engage in a cognitive dialogue with your history. The system employs intent analysis to synthesize broad summaries or pinpoint specific technical insights.
 - **Semantic Vector Search**: Move beyond keyword matching. Locate information based on conceptual depth and semantic relevance.
 - **Persistent State Tracking**: Frequent checkpoints allow the system to resume progress after any interruption.
 - **Interactive Synthesis (REPL)**: A streamlined command-line interface for human-system synergy.
 
+## Stealth & Behavioral Resilience
+
+The scraper employs advanced behavioral modeling to achieve 1:1 parity with natural browsing, bypassing Cloudflare and Turnstile challenges:
+
+- **Structural Interaction**: Targets the internal Turnstile widget structure directly, monitoring response tokens to ensure bypass integrity.
+- **Vision-Based Fallback**: Captures snapshots and leverages AI reasoning to identify exact interaction coordinates if structural methods fail.
+- **Ghost-Cursor Integration**: Utilizes `ghost-cursor` to generate authentic, non-linear mouse paths, making detection statistically improbable.
+- **Session Reputation**: Establishes browser trust through "Session Warming" (visiting the home page and simulating browsing) before sensitive data access.
+
 ## Environment Setup Guide
 
 If you are new to development or don't have the necessary tools installed, follow these steps to set up your environment.
@@ -72,10 +82,11 @@ We recommend using a version manager to install Node.js. This allows you to easi
 ### 2. Install Ollama (The AI Intelligence)
 
 1. Download and install Ollama from [ollama.ai](https://ollama.ai).
-2. Open your terminal and pull the required models:
+2. The system will automatically pull the required models on first run, but you can also pull them manually:
    ```bash
    ollama pull nomic-embed-text
-   ollama pull deepseek-r1
+   ollama pull deepseek-r1:7b
+   ollama pull qwen3.5:4b
    ```
 
 ### 3. Download and Prepare the Project
@@ -99,28 +110,27 @@ cp .env.example .env
 
 ### Key Environment Variables
 
-- **OLLAMA_URL**: Access point for your local AI engine (default: http://localhost:11434).
-- **OLLAMA_MODEL**: Cognitive model for RAG synthesis (e.g., deepseek-r1).
-- **OLLAMA_EMBED_MODEL**: Model for generating vector representations (e.g., nomic-embed-text).
+- **LLM_SOURCE**: Set to `ollama` (local) or `openrouter` (cloud).
+- **LLM_RAG_MODEL**: Cognitive model for RAG synthesis (default: `deepseek-r1:7b`).
+- **LLM_VISION_MODEL**: Model for vision-based security bypass (default: `qwen3.5:4b`).
 - **ENABLE_VECTOR_SEARCH**: Set to `true` to activate semantic and RAG layers.
+- **DISCOVERY_MODE** & **EXTRACTION_MODE**: Choose between `api`, `scroll`, `interaction`, and `ai`.
 
 ## Usage Guide
 
 Launch the system:
 
 ```bash
-# Start the development environment
+# Start system
 npm run dev
 ```
 
+**Note**: The system requires at least **10GB of free disk space** to operate safely with local AI models.
+
 ### Operational Directives
 
 - **Start scraper (Library)**: Initiates extraction. Authenticate manually if required.
-- **Search conversations**: Interface with your history using various modes:
-  - **Auto**: Heuristic selection between semantic and exact search.
-  - **Semantic**: Fuzzy matching via high-dimensional vector space.
-  - **RAG**: Direct inquiry—e.g., "What did I learn about emergent intelligence?"
-  - **Exact**: Rapid string matching via ripgrep (bundled).
+- **Search conversations**: Interface with your history using various modes (Auto, Semantic, RAG, Exact).
 - **Build vector index**: Processes Markdown exports into a local vector store.
 - **Reset all data**: Purges checkpoints, authentication data, and the vector index.
 
@@ -140,11 +150,11 @@ For a detailed look at our RAG implementation, hybrid search strategy, and theor
 
 ### Project Structure
 
-- **src/ai/**: Ollama interaction and advanced RAG orchestration layers.
-- **src/scraper/**: Playwright-based extraction logic and parallel worker pool management.
+- **src/ai/**: Provider management and advanced RAG orchestration layers.
+- **src/scraper/**: Patchright-based extraction logic and parallel worker pool management.
 - **src/search/**: Vector storage (Vectra) and ripgrep search implementation.
 - **src/repl/**: Interactive CLI components.
-- **src/utils/**: Shared utility functions for data chunking and logging.
+- **src/utils/**: Shared utility functions for behavioral navigation and logging.
 
 ## Testing