Automata is a domain-agnostic, heuristic-driven web automation and extraction engine. It allows users to submit natural-language goals to scrape and analyze modern websites. The system automatically navigates, parses complex DOM trees without hardcoded selectors, and streams real-time execution logs and screenshots to a sleek Next.js dashboard.
- Domain-Agnostic Extraction Engine: Relies on semantic structure scoring instead of fragile CSS selectors.
- Async Concurrency: Process multiple extraction jobs simultaneously using built-in
asyncio.Semaphorequeuing. - Live Observability: Real-time WebSocket streaming of logs and headless browser screenshots.
- Premium Dashboard: A sleek, responsive Next.js 14 frontend with concurrent job history, formatted datasets, and raw JSON viewers.
The easiest and recommended way to run the entire stack is via Docker Compose.
- Docker and Docker Compose installed on your machine.
-
Clone the repository (if you haven't already) and navigate to the project root:
cd automata/backend -
Start the stack: This will spin up the PostgreSQL database, the FastAPI Python backend, and the Next.js frontend all at once.
docker-compose up --build
-
Access the Application:
- Frontend UI: http://localhost:3000
- Backend API Docs: http://localhost:8000/docs
If you wish to run the services individually on your host machine.
Ensure you have a PostgreSQL server running locally. Create a database named automata.
Navigate to the backend folder:
cd backendSet up your virtual environment and install dependencies:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtInstall Playwright browsers:
playwright install chromiumCreate a .env file in the backend/ directory:
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/automataStart the backend server:
uvicorn app.main:app --reload --port 8000Navigate to the frontend folder:
cd frontendInstall dependencies:
npm installCreate a .env.local file in the frontend/ directory:
NEXT_PUBLIC_API_URL=http://localhost:8000Start the frontend development server:
npm run devFor a detailed technical breakdown of the architecture, design decisions, and heuristic engine philosophy, please refer to the docs/ folder:
docs/reasoning.md- System architecture and design choices.docs/schema.sql- Database layout.