Skip to content

RealComputer/GlassKit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

457 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GlassKit

Build smart AI apps for smart glasses, fast.

GlassKit is an open-source dev suite for building vision-enabled smart glasses apps. It provides SDKs and backends that turn real-time camera and microphone streams into specialized AI responses and actions, tailored to your workflow.

Today: this repository focuses on end-to-end examples you can adapt. Next: reusable SDKs + a production-ready backend are coming up.

Examples/Templates you can use

IKEA assembly assistant Sushi speedrun HUD Drink-making coach
demo.webm
demo.webm
demo.webm
Code ➡️ · Code (+ RF-DETR) ➡️

Vision-enabled voice assistant for Rokid Glasses. Streams mic + camera to the OpenAI Realtime API over WebRTC for spoken IKEA assembly guidance. The RF-DETR variant adds object detection for stronger visual understanding.
Code ➡️

Real-world speedrun HUD for Rokid Glasses. Streams video over WebRTC with a data channel to the backend, which runs a fine-tuned RF-DETR object detector for automatic, hands-free split completion based on a configured route.
Code ➡️

Proactive drink-making assistant for Rokid Glasses. Streams live camera video to Overshoot for scene understanding and uses the OpenAI Realtime API for low-latency spoken guidance and transcript streaming.
Privacy filter Scene-description HUD Voice command + phone support
demo.mp4
demo.webm
demo.mp4
Code ➡️

Real-time privacy filter that sits between the camera and app. Anonymizes faces without consent, detects and remembers verbal consent, and runs locally with recording support.
Code ➡️

Simple Rokid Glasses app that streams camera video to Overshoot and shows live inference text on the HUD.
Code ➡️

Reference app for Rokid Glasses voice commands and Android phone/emulator support. Includes camera, microphone, speaker, and menu-screen patterns with touchscreen controls that mirror the Rokid touchpad.

Why GlassKit

Smart glasses apps are hard.

  • Generic vision-capable LLMs often fail at real-world task support.
  • Each glasses brand has different hardware, form factors, and frameworks.
  • Real-time camera + mic streaming is non-trivial to build correctly and ergonomically.

GlassKit is built around:

  • Vision model orchestration: choose the right mix of multimodal LLMs and object detectors for the job.
  • Visual context management: define what the AI should know and how it is represented.
  • Real-time streaming: camera + mic in, responses out, with sane developer ergonomics.

How it works

You define your AI with visual/textual context and your business logic. Then your app works like this:

  1. Camera frames and audio stream from the glasses to the backend via the SDK
  2. The backend processes inputs using vision models and LLMs with your custom context + logic
  3. Responses stream back to the glasses and the wearer via the SDK

You handle the app logic. GlassKit handles the glasses-to-AI pipeline.

Getting started

  1. Pick an example from examples/
  2. Open its README and follow the setup steps
  3. Run it, then modify for your workflow

Status and roadmap

GlassKit is early and under active development, but the examples are usable today.

  • Current focus: end-to-end templates you can clone and adapt
  • Coming next: reusable SDKs + production-ready backends
  • Developer experience: demo video recording tooling; observability + debuggability tools
  • Platform support today: Rokid Glasses
  • Planned support: Meta glasses, Android XR, Mentra, and more

Contributing

Contributions are welcome!

By submitting a pull request, you agree that your contribution is licensed under the MIT License of this project (see LICENSE), and you confirm that you have the right to submit it under those terms.

About

😎 Dev suite for building reliable, real-time, vision-enabled smart glasses apps

Resources

License

Stars

Watchers

Forks

Contributors