Skip to content
View TallMessiWu's full-sized avatar
♟️
Chess is hard.
♟️
Chess is hard.

Block or report TallMessiWu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
TallMessiWu/README.md

Website

en zh

Hi there! 👋 I'm Junlin Wu

AI Engineer · Agent Developer · Open Source Builder

👨🏻‍💻 About Me

AI Engineer at Huawei, owning quantization and adaptation of multi-modal diffusion models and LLMs on Ascend chips. I build at the intersection of RAG, ReAct agents, and low-level inference optimization — turning ideas into verifiable, reusable engineering pipelines.

  • 🤖 AI Engineer based in Shanghai, China — specializing in LLM inference, Agent systems, and model quantization.
  • 🐱 Cat lover · ⚽ Football · 🎸 Guitar & 🎹 Keyboard · 📷 Photography.
  • ✈️ Travel · ♟️ Chess · ⌨️ DOTA 2 — always up for a good game or adventure.

🎓 Education

  • 🦁 Columbia University — M.S. in Data Science (Sep 2023 - Dec 2024)
    • Focus: Generative AI & LLM Applications. Coursework: Computer Vision, NLP, Statistical Modeling, ML in Practice.
  • 🔱 University of California San Diego — B.S. in Data Science, Minor in Economics (Sep 2019 - Mar 2023)
    • GPA 3.9/4.0 · Cum Laude · Provost Honors (2019-2023)
  • 🏫 The Derryfield School — High School Diploma (Sep 2017 - Jun 2019)
  • 📚 Linfield Christian School — High School (Aug 2015 - Jun 2017)

🏢 Work Experience

  • 🤖 AI Engineer at Huawei Technologies Co., Ltd. (Aug 2025 - Present | Shanghai)

    • Multi-modal Diffusion Model Quantization & Optimization: Led quantization adaptation for Flux, Wan 2.2, and Hunyuan models on next-gen Ascend chips. Achieved 1.5× inference speedup for Wan 2.2 & Hunyuan, and 2×+ for Flux (exceeding the 1.6× target) through operator-level profiling, precision tuning, and fused operator integration. Built automated DiT-block profiling analysis tool (pandas-based), saving ~50% analysis time and adopted as team standard across 5+ projects.
    • Qwen3.5 End-to-End Adaptation (Model Owner): Fully owned Qwen3.5 adaptation on vLLM framework for next-gen Ascend chips — covering functionality validation, quantization, operator fusion, PD separation deployment, and end-to-end performance tuning. Coordinated cross-team efforts across operator, framework, and quantization teams to ensure on-time delivery.
    • SGLang MXFP Quantization: Contributed MXFP4 & MXFP8 quantization support to SGLang across three model scenarios — Diffusion, Dense LLM, and MoE LLM. Leveraged Claude Code for AI-assisted development, completing the originally estimated 2-month workload in ~1 week (~75% efficiency gain), validating AI-assisted engineering for complex framework-level tasks.
    • Engineering Efficiency & Team Impact: Early adopter and advocate of AI-assisted development (Claude Code) within the team — led 2 internal tech talks, authored the team's AI-assisted coding guide. Built a web-based server whitelist management system to improve team resource utilization. Contributed technical documentation and led multiple internal knowledge-sharing sessions on quantization, performance tuning, and AI-assisted development.
  • 🧬 Data Analyst at Columbia University Medical Center (Nov 2023 - Dec 2024 | New York)

    • Designed end-to-end neuronal activity analysis pipeline for in vivo recording data. Quantified temporal relationships between neural activity and feeding behavior through statistical modeling, supporting identification of key neuropeptidergic neurons regulating satiation.
    • Co-author on "Brainstem neuropeptidergic neurons link a neurohumoral axis to satiation" (under review at Cell). Built automated data processing scripts adopted for subsequent similar studies.
  • 🎉 Head of Development at UCSD CSSA (Mar 2020 - Mar 2023 | San Diego)

    • Built the organization's official website with Vue.js, Vite, and Flask. Created programming tutorials for 15 team members and designed an internal knowledge-sharing portal.
    • UCSD CSSA Website · Department Portal

💻 Technical Skills

Level Skills
Expert Python (Pandas / NumPy / Matplotlib), PyTorch, LLM Inference & Serving (vLLM, SGLang), RAG (LangChain, Pinecone), Agent Development (ReAct, Function Calling, MCP)
Proficient Model Quantization (MXFP / INT8), Prompt Engineering, OpenAI / DeepSeek API, Streamlit, OpenCV, Scikit-learn, SQL, Flask, asyncio / httpx, GitHub REST API
Familiar TensorFlow, Vue.js, TypeScript, D3.js, Java / Spring, Redis, MongoDB, Docker, Git, Linux

🚀 Featured Projects

Project Description
react-agent Production-grade ReAct Agent built from scratch — zero framework dependencies, DeepSeek API driven. Five-layer architecture with per-tool circuit breakers, exponential backoff with jitter, and graceful degradation. Chain-of-thought visualization via DeepSeek reasoning_content.
TD LLM Streamlit Gen AI Executive Advisor for TD Bank — RAG-powered strategic Q&A with GPT-4o + LangChain + Pinecone Serverless. Semantic sliding window chunking, multi-stage retrieval, hallucination guard, and truLens LLM-as-Judge evaluation.
SGLang Quant Eval MXFP8/MXFP4 quantization on SGLang for Huawei Ascend NPU — covering Diffusion, Dense LLM, and MoE scenarios with both online quantization and offline pre-quantized weight inference.
DOTA 2 Drafting Backend ML backend for DOTA 2 hero recommendation — 30K+ matches, 821-dim feature engineering, PyTorch / XGBoost / Random Forest ensemble. 51.8% overall accuracy, 58.3% on known counter subsets.
DOTA 2 Drafting Frontend Vue 3 + Element Plus frontend with captain-mode hero selection interface for DOTA 2 match prediction.
Personal Website Modern responsive portfolio built with Vue 3, TypeScript, and Element Plus — VS Code-inspired design with i18n and dark/light themes.
agent-building Progressive 4-week AI Agent system build — ReAct loops, MCP protocol tooling, resilient fault tolerance, and LangGraph state machines.
Iris Recognition Deep learning-based iris recognition with PyTorch.
Climate Analysis 100 years of climate data analysis and global warming visualization with Python & D3.js.

📄 Publications

  • Chowdhury, S., Kamatkar, N. G., Wang, W. X., Akerele, C. A., Huang, J., Wu, J., Kane, C. M., Bhave, V. M., Huang, H., Wang, X., & Nectow, A. R. (under review). Brainstem neuropeptidergic neurons link a neurohumoral axis to satiation. Cell.
  • Wu, J. (2022). A WeChat Application Combined with Convolutional Neural Network for Skin Cancer Recognition. International Conference on Artificial Intelligence in Education (ICAIE).

📱 Contact Me

Pinned Loading

  1. agent-building agent-building Public

    零框架从零构建生产级 AI Agent — 四周手写 ReAct / MCP / LangGraph / AutoGen / Multi-Agent 协作

    Jupyter Notebook 2

  2. td-llm-streamlit td-llm-streamlit Public

    Jupyter Notebook 1

  3. sglang_quant_eval sglang_quant_eval Public

    Research and evaluate about workload of adapting mxfp8 quant on sglang.

    Python

  4. dota2-drafting-backend dota2-drafting-backend Public

    Jupyter Notebook

  5. personal-website personal-website Public

    A modern personal website built with Vue 3 & TS, featuring a VS Code-style aesthetic, i18n support, and dark/light mode toggle.

    Vue 1

  6. personal-website-admin personal-website-admin Public

    The backend management system for my personal website, built with Vue 3, TypeScript, Vite, and powered by Tencent CloudBase.

    Vue 1