Skip to content

RT64M/gesture-data_collection

Repository files navigation

Gesture Data Collection and Evaluation

Launch the browser demo

Live Demo:

Gesture Data Collection and Evaluation is a research prototype for collecting natural human gestures over computer interface screenshots and evaluating multimodal models on gesture-intent understanding.

The project has two main surfaces:

  • A FastAPI + Vue web app with a researcher admin UI and a participant collection UI.
  • A CLI evaluation runner that sends gesture videos and screenshots to target models, then scores predicted intent with a judge model.

This public repository includes source code, tests, prompt templates, and small synthetic fixtures. It intentionally does not include local collection databases, real participant videos, private questionnaire responses, run outputs, or API keys.

Features

  • Participant collection flow at /collect.
  • Researcher admin UI at /admin.
  • Screenshot library and collection enable/disable controls.
  • Gesture recording with target-region selection and free-text gesture/intent descriptions.
  • Questionnaire templates and responses.
  • Collection package preview/import support.
  • Model evaluation modes: single, two_stage, video_only, both, all, and openrouter_only.
  • Providers: mock, OpenAI/GPT, Gemini/Gemma, Qwen, DeepSeek judge, OpenRouter, OpenAI-compatible endpoints, and Ollama.
  • JSONL dataset manifests and structured run outputs.

Repository Contents

  • gesture_eval/: Python package, FastAPI app, CLI runner, providers, database helpers, prompts, and evaluation core.
  • frontend/src/: Vue 3 source for admin and collection pages.
  • examples/: small synthetic fixtures for smoke tests and schema examples.
  • prompts/: target-model and judge prompt templates.
  • tests/: Python unittest coverage.
  • scripts/: safe utility scripts.
  • docs/images/: public-safe synthetic screenshots used in this README.

The following are intentionally ignored and should remain local:

  • .env
  • data/
  • runs/
  • output/
  • .tmp-tests/
  • .uv-cache/
  • .venv/
  • frontend/node_modules/
  • gesture_eval/web/static/app/
  • real collected videos, local SQLite databases, questionnaires, and API keys

Screenshots

The images below are synthetic AI-generated desktop/application screenshots used as public examples.

Synthetic calendar task screenshot

Synthetic spreadsheet filter screenshot

Synthetic kanban board screenshot

Static Public Demo

This repository also includes an English-only GitHub Pages showcase at docs/index.html. It is designed for direct browser launch and does not require the FastAPI server.

The Pages showcase links to two static demos that mirror the real /collect and /admin surfaces as closely as possible without a backend:

  • docs/demo/index.html: participant collection, cross evaluation using the same browser-local clip, questionnaire, and simulated upload.
  • docs/researcher/index.html: research console dashboard, Cloudflare tunnel panel, provider settings, collection analysis, evaluation setup, and run detail preview.

For GitHub Pages, publish the repository from the docs/ folder and open the Pages root. For local preview, open docs/index.html in a current desktop browser.

Quick Start

Install Python dependencies with uv:

UV_CACHE_DIR=.uv-cache uv sync

Install frontend dependencies:

cd frontend
npm install

Build the frontend before serving the production web app:

cd frontend
npm run build

Run the Web App

From the repository root:

UV_CACHE_DIR=.uv-cache uv run python -m gesture_eval.web serve \
  --db data/gesture_data.sqlite \
  --host 127.0.0.1 \
  --port 8765

Open:

  • Admin UI: http://127.0.0.1:8765/admin
  • Collection UI: http://127.0.0.1:8765/collect

The app creates local SQLite and media files under data/. That directory is ignored by Git because it may contain participant data.

The helper scripts provide the same local flow:

./start.sh

Windows PowerShell:

.\start.ps1

Cloudflare Tunnel for Participant Collection

For short-lived remote collection sessions, run the local FastAPI app and expose it with a temporary Cloudflare tunnel:

cloudflared tunnel --url http://127.0.0.1:8765

The admin dashboard also includes tunnel status/start/stop controls when cloudflared is available. Share only the /collect link with participants. Keep /admin protected with GESTURE_ADMIN_PASSWORD in any shared environment.

Model Providers and API Keys

Copy .env.example to .env for local credentials:

cp .env.example .env

Supported environment variables include:

  • OPENAI_API_KEY
  • GEMINI_API_KEY
  • GOOGLE_GEMINI_API_KEY
  • OPENROUTER_API_KEY
  • OPENROUTER_BASE_URL
  • QWEN_API_KEY
  • DEEPSEEK_API_KEY
  • MULTIMODAL_API_KEY
  • GESTURE_ADMIN_PASSWORD
  • GESTURE_ADMIN_COOKIE_SECURE

The prototype can write provider settings to .env from the admin UI. Do not commit .env.

CLI Evaluation Smoke Test

Run the mock provider against the synthetic example dataset:

UV_CACHE_DIR=.uv-cache uv run python -m gesture_eval.cli \
  --dataset examples/dataset.jsonl \
  --target-model mock-vlm \
  --provider mock \
  --mode both \
  --judge-provider mock \
  --judge-model mock-judge \
  --output-dir runs/smoke

Outputs include:

  • call_logs.jsonl
  • results.jsonl
  • summary.json
  • detailed_report.json
  • paper_report.md

runs/ is ignored because real runs can include prompts, model responses, media paths, and local evaluation metadata.

Synthetic Screenshot Generation

Dry-run synthetic screenshot generation:

UV_CACHE_DIR=.uv-cache uv run python -m gesture_eval.screenshot_cli \
  --core-intent "archive the highlighted email" \
  --ui-category email \
  --output-dir runs/generated_screenshots \
  --mock

Real image generation requires an OpenAI-compatible Images API key:

MULTIMODAL_API_KEY=<your-key> UV_CACHE_DIR=.uv-cache uv run python -m gesture_eval.screenshot_cli \
  --core-intent "open the selected user's profile" \
  --ui-category people_grid \
  --image-base-url https://example.com/v1 \
  --use-prompt-model \
  --count 3

Generated screenshots should be reviewed before becoming public examples.

Data and Privacy Notes

  • SQLite stores media paths, not video/image blobs.
  • Public collection routes should expose only screenshots marked as collection-enabled.
  • Collection sessions use server-generated tokens for session-scoped access.
  • Admin routes and evaluation routes should be protected with GESTURE_ADMIN_PASSWORD when shared beyond local development.
  • API keys may be stored in .env during prototype use.
  • Real videos, questionnaires, user/session data, and run outputs are not part of this public repository.

Tests

Python:

UV_CACHE_DIR=.uv-cache uv run python -m unittest discover -s tests -v
UV_CACHE_DIR=.uv-cache uv run python -m compileall gesture_eval tests

Frontend:

cd frontend
npm test -- --run
npm run build

Public release check:

UV_CACHE_DIR=.uv-cache uv run python scripts/check_public_release.py

About

Gesture data collection and multimodal evaluation research prototype with a live browser demo

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors