I am a specialized AI Systems & Full-Stack Engineer focused on building production-grade LLM architectures, real-time Voice AI streaming systems, and high-throughput inference pipelines. I design software that bridges the gap between state-of-the-art AI research and practical, scalable engineering.
- 30+ Active Codebases covering LLM optimization, offline voice interfaces, and high-performance WebRTC streaming.
- Deep Core Systems Integration: Orchestrating complex pipelines with Redis, Kafka, and hardware-aware mutual exclusion algorithms.
- Production AI Deployments: Veteran developer of enterprise SaaS backends, browser automation agents, and localized AI portals.
To provide immediate clarity for technical stakeholders and recruitment reviewers, my work directly maps onto the following target positions:
| Target Pillar | Core Alignment & Competencies | Key Evidence (In my Repos) |
|---|---|---|
| Remote AI Engineer | Real-time audio processing, WebRTC audio streaming, Speech-to-Speech orchestration, and low-latency voice assistants. | Aura-TTS • OrpheusAssistant |
| LLM System Engineer | Edge model inference, localized LLM routing engines, Silero VAD integration, Kafka pipeline messaging, and local API optimization. | QuickCall • OrpheusAssistant |
| AI Research / Alignment | Fine-tuning using Unsloth LoRA/QLoRA, RLHF/GRPO logic training without critic models (DeepSeek-R1 styles), and model alignment. | LLMs-Unsloth • smol-course |
| AI Full Stack Engineer | Enterprise web dashboards, modular NestJS/FastAPI backends, secure multi-tier authentication, PostgreSQL/Supabase DBs, and global semantic caching. | ASK ILM • OmniSupport AI • Gold-Arbitrage |
Here is a curated overview of my primary repositories representing deep engineering focus and production capabilities:
A specialized research-to-production pipeline repository detailing high-throughput fine-tuning, reasoning models, and edge optimizations.
- Engineering Highlights:
- Fine-tuning pipelines utilizing Unsloth kernels for parameter-efficient optimizations (LoRA / QLoRA).
- RLHF training using Group Relative Policy Optimization (GRPO) without a separate Critic model (pioneered by DeepSeek-R1) targeting
Qwen3 8B. - Horizontal engineering recipes for Mixture-of-Experts (MoE) kernels and ModernBERT dense classification systems.
- Key Stack: Unsloth, PyTorch, LoRA, QLoRA, HuggingFace Transformers, TRL, JAX.
A suite of voice engineering portals dedicated to running low-latency, state-of-the-art offline speech synthesis and real-time assistants.
- Engineering Highlights:
- Aura-TTS: Created a unified speech workstation running locally under RAM/VRAM resource boundaries using custom mutual exclusion locking to manage engine lifecycles (Kokoro-TTS, Pocket-TTS, Supertonic).
- OrpheusAssistant: Advanced offline voice agent linking real-time WebRTC bi-directional streams (via
aiortc), Whisper Large STT, and LLaMA 3.2 3B. Integrates directly withn8nfor workflow tool execute actions.
- Key Stack: Python, FastAPI, WebRTC, WebSockets, PipeCat, PyTorch, n8n, Docker.
An event-driven speech-to-speech architecture demonstrating production backend streaming logic.
- Engineering Highlights:
- Parallel Producer-Consumer architecture isolating audio recording from backend text transcription.
- Integrates Silero Voice Activity Detection (VAD) to segment speech in real-time, feeding files to a background transcription worker.
- Transcription streams directly onto Apache Kafka event streams, triggering downstream TTS and post-processing APIs.
- Key Stack: Python, PyTorch, Silero VAD, Faster Whisper, Apache Kafka, Pydantic.
A massive offline-first, open-source educational OS designed specifically for classrooms in developing regions.
- Engineering Highlights:
- Multi-tier secure roles (Super Admin, Principal, Teacher, Student) with isolated school dashboards.
- Innovative Tri-Layer Semantic Cache and LLM router to fetch cached school contents and save AI generation quota / cost.
- Integrated text-to-speech (Deepgram), document rendering (PDF, Excel tables), and local Firebase overrides.
- Key Stack: React 19, TypeScript, Express, Supabase, Firebase, Anthropic SDK, Google GenAI SDK, Vite, Motion.
An enterprise-grade productivity suite automating prompt queues for high-volume video/image generation pipelines.
- Engineering Highlights:
- Extension Code: Chrome extension running background batch automation scripts with organic delay controls (stealth pacing) to circumvent platform limits.
- Licensing System: Fully integrated with a serverless backend that binds license keys to secure client-side Hardware Device IDs using Supabase PostgreSQL.
- Key Stack: JavaScript, HTML5, Express, Supabase PostgreSQL, Vercel Serverless Functions.
- 🛠️ Strong System Architect: Proficient in event sourcing (Kafka), memory management (Redis), and model optimization (Unsloth, Triton).
- 🌐 Clean API Designer: Expert in RESTful architecture, WebRTC bi-directional streams, and secure serverless gateways.
- 📦 Docker-first Deployer: Containerizing complex AI stacks with multi-stage builds and clean Docker Compose networking configs.
I am actively exploring Remote AI Engineer, Junior LLM System Engineer, AI Research, and AI Full-Stack opportunities globally. Let's build the future of localized, low-latency, and event-driven AI systems.

