Dhruv Garg DhruvGarg111

Dhruv Garg

AI / ML Engineer • Computer Vision • Generative AI

Building practical AI systems, one focused iteration at a time.

Building intelligent systems that see, understand, and create.

🔬 Engineering Profile

I am a Machine Learning Engineer focused on Computer Vision and Agentic AI, with a strong foundation in scalable backend systems. My engineering philosophy revolves around translating complex research papers into optimized, production-ready code.

🎯 Focus: Bypassing computational bottlenecks in high-resolution (4K) object detection using Explainable AI (XAI).
🤖 AI Engineering: Building local LLM agents that seamlessly interact with third-party ecosystems (Google APIs, etc.).
⚙️ Infrastructure: Architecting robust database migrations and building backend profilers.
💡 Goal: I build systems that are not just intelligent, but fast, scalable, and resilient.

🚀 Featured Projects

🌟 Flagship Projects

⚡ PixelQueue

Vision Intelligence Infrastructure: A high-performance, async control panel for human-in-the-loop AI annotation.

A sleek, dark-themed control panel designed for decoupled ML microservices and robust task queues, eliminating UX bottlenecks with pure speed and instantaneous rendering.

Key Innovations:

🚀 Asynchronous ML: Non-blocking AI auto-labeling via PyTorch, LayerCAM & YOLO.
⚡ Zero-Latency UI: Hardware-accelerated React-Konva staging canvas.
🔄 Decoupled Workers: Infinite horizontal scaling using Celery message brokers.
🔒 Isolated Workspaces: Robust Role-Based Access Control (RBAC) circuits.

🔦 The Searchlight Protocol

"Finding the needle in the haystack, from 400ft above."

A novel coarse-to-fine computer vision pipeline designed for efficient small object detection in high-resolution (2K/4K) aerial imagery. Tackles the critical trade-off between resolution and latency in drone forensics.

Key Innovations:

Uses LayerCAM to identify semantic "hotspots" before processing.
Intelligently slices and zooms into regions of interest—skipping 80%+ of empty backgrounds.
Outperforms blind sliding-window approaches (SAHI) in both speed and accuracy.

🎨 Neural Canvas

Transform any image into a masterpiece — in real-time.

A fast neural style transfer implementation that generates stylized images using a feed-forward CNN trained with perceptual loss. Performs instant stylization in a single forward pass.

Key Features:

🚀 Real-time inference with a custom residual architecture.
🧠 Perceptual content & style loss using a pretrained VGG-16 network.
🔁 Instance Normalization integrated for high-quality, artifact-free outputs.
📦 ONNX export supported, ready for edge deployment.

📦 More Projects

🧭 pygog (Google CLI Agent)
A powerful CLI for Google services (Gmail, Drive, Calendar). Features a built-in natural language AI agent supporting Gemini, DeepSeek, & OpenAI.
<Python> <Google APIs> <LLM Agents>

📐 Depth Estimation + Semantic Seg.
Multi-modal depth completion using RGB + sparse depth + semantic maps. Features a DepthNet-style encoder-decoder trained on NYU Depth v2 with multi-scale supervision.
<PyTorch> <NYU-Depth-v2> <Encoder-Decoder>

🛠️ Stack Matrix

🌐 Open Source Contributions

I actively contribute to the broader developer ecosystem, with recent merged work spanning agent frameworks, AI infrastructure, developer tooling, and performance-focused ML apps. I am also an active Collaborator at SynapseKit organization:

SynapseKit/SynapseKit: Shipped 109 PRs covering native observability, VoiceAgent audio pipelines, graph-builder tooling, benchmark suites, CronTrigger scheduling, self-healing cost-aware agents, persistent agent memory, multimodal RAG ingestion, knowledge graph retrievers, Discord automation, cloud/data loaders, and local/self-hosted model integrations across 15+ LLM providers.
lancedb/lancedb: Updated LanceDB's Python Gemini embedding provider to the newer google-genai SDK and opened a fix for async event loop blocking in AsyncTable.add embeddings.
Nikolaev3Artem/fastapi-silk: Merged 5 PRs adding per-endpoint database trigger counters, multi-version compatibility matrix, SQLite+Alembic setup, SQL profiler tests, and comprehensive README documentation.
Bessouat40/RAGLight: Submitted 3 PRs adding MCP server configuration CLI support (in review) and Docling-based high-fidelity PDF ingestion (open).
pydantic/pydantic-ai: Merged Anthropic code execution tool upgrade

📊 Telemetry

🔗 Connect & Explore

Website • Searchlight Live App • Email Me

^{Built by DhruvGarg111}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly