Osama Altaf osamaaltaf-pk

🚀 Professional Profile

I am a specialized AI Systems & Full-Stack Engineer focused on building production-grade LLM architectures, real-time Voice AI streaming systems, and high-throughput inference pipelines. I design software that bridges the gap between state-of-the-art AI research and practical, scalable engineering.

30+ Active Codebases covering LLM optimization, offline voice interfaces, and high-performance WebRTC streaming.
Deep Core Systems Integration: Orchestrating complex pipelines with Redis, Kafka, and hardware-aware mutual exclusion algorithms.
Production AI Deployments: Veteran developer of enterprise SaaS backends, browser automation agents, and localized AI portals.

🎯 Target Role Alignment & Capabilities

To provide immediate clarity for technical stakeholders and recruitment reviewers, my work directly maps onto the following target positions:

Target Pillar	Core Alignment & Competencies	Key Evidence (In my Repos)
Remote AI Engineer	Real-time audio processing, WebRTC audio streaming, Speech-to-Speech orchestration, and low-latency voice assistants.	Aura-TTS • OrpheusAssistant
LLM System Engineer	Edge model inference, localized LLM routing engines, Silero VAD integration, Kafka pipeline messaging, and local API optimization.	QuickCall • OrpheusAssistant
AI Research / Alignment	Fine-tuning using Unsloth LoRA/QLoRA, RLHF/GRPO logic training without critic models (DeepSeek-R1 styles), and model alignment.	LLMs-Unsloth • smol-course
AI Full Stack Engineer	Enterprise web dashboards, modular NestJS/FastAPI backends, secure multi-tier authentication, PostgreSQL/Supabase DBs, and global semantic caching.	ASK ILM • OmniSupport AI • Gold-Arbitrage

🛠️ Technical Ecosystem

AI & LLM Systems

Real-Time & Backend Infrastructure

Frontend Development

📂 Featured Deep-Dives

Here is a curated overview of my primary repositories representing deep engineering focus and production capabilities:

1. 🧠 LLMs-Unsloth (LLM Optimization & Reasoning Hub)

A specialized research-to-production pipeline repository detailing high-throughput fine-tuning, reasoning models, and edge optimizations.

Engineering Highlights:
- Fine-tuning pipelines utilizing Unsloth kernels for parameter-efficient optimizations (LoRA / QLoRA).
- RLHF training using Group Relative Policy Optimization (GRPO) without a separate Critic model (pioneered by DeepSeek-R1) targeting Qwen3 8B.
- Horizontal engineering recipes for Mixture-of-Experts (MoE) kernels and ModernBERT dense classification systems.
Key Stack: Unsloth, PyTorch, LoRA, QLoRA, HuggingFace Transformers, TRL, JAX.

2. 🎙️ Aura-TTS & OrpheusAssistant (Voice AI & Real-Time Orchestration)

A suite of voice engineering portals dedicated to running low-latency, state-of-the-art offline speech synthesis and real-time assistants.

Engineering Highlights:
- Aura-TTS: Created a unified speech workstation running locally under RAM/VRAM resource boundaries using custom mutual exclusion locking to manage engine lifecycles (Kokoro-TTS, Pocket-TTS, Supertonic).
- OrpheusAssistant: Advanced offline voice agent linking real-time WebRTC bi-directional streams (via aiortc), Whisper Large STT, and LLaMA 3.2 3B. Integrates directly with n8n for workflow tool execute actions.
Key Stack: Python, FastAPI, WebRTC, WebSockets, PipeCat, PyTorch, n8n, Docker.

3. ⚡ QuickCall (S2S Event-Driven Pipeline)

An event-driven speech-to-speech architecture demonstrating production backend streaming logic.

Engineering Highlights:
- Parallel Producer-Consumer architecture isolating audio recording from backend text transcription.
- Integrates Silero Voice Activity Detection (VAD) to segment speech in real-time, feeding files to a background transcription worker.
- Transcription streams directly onto Apache Kafka event streams, triggering downstream TTS and post-processing APIs.
Key Stack: Python, PyTorch, Silero VAD, Faster Whisper, Apache Kafka, Pydantic.

4. 🎓 ASK ILM (Offline AI Learning Operating System)

A massive offline-first, open-source educational OS designed specifically for classrooms in developing regions.

Engineering Highlights:
- Multi-tier secure roles (Super Admin, Principal, Teacher, Student) with isolated school dashboards.
- Innovative Tri-Layer Semantic Cache and LLM router to fetch cached school contents and save AI generation quota / cost.
- Integrated text-to-speech (Deepgram), document rendering (PDF, Excel tables), and local Firebase overrides.
Key Stack: React 19, TypeScript, Express, Supabase, Firebase, Anthropic SDK, Google GenAI SDK, Vite, Motion.

5. 🛡️ Omni Automator & Licensing System

An enterprise-grade productivity suite automating prompt queues for high-volume video/image generation pipelines.

Engineering Highlights:
- Extension Code: Chrome extension running background batch automation scripts with organic delay controls (stealth pacing) to circumvent platform limits.
- Licensing System: Fully integrated with a serverless backend that binds license keys to secure client-side Hardware Device IDs using Supabase PostgreSQL.
Key Stack: JavaScript, HTML5, Express, Supabase PostgreSQL, Vercel Serverless Functions.

📊 Developer Metrics & Impact

🛠️ Strong System Architect: Proficient in event sourcing (Kafka), memory management (Redis), and model optimization (Unsloth, Triton).
🌐 Clean API Designer: Expert in RESTful architecture, WebRTC bi-directional streams, and secure serverless gateways.
📦 Docker-first Deployer: Containerizing complex AI stacks with multi-stage builds and clean Docker Compose networking configs.

💼 Open to Remote Opportunities

I am actively exploring Remote AI Engineer, Junior LLM System Engineer, AI Research, and AI Full-Stack opportunities globally. Let's build the future of localized, low-latency, and event-driven AI systems.

📧 Get In Touch • 💬 Chat on WhatsApp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly