FitScript — Fitness Prescription Agent Environment

title

emoji

🏋️

colorFrom

indigo

colorTo

green

sdk

docker

pinned

false

app_port

8000

base_path

/web

FitScript — Fitness Prescription Agent Environment

An OpenEnv environment where an AI agent must generate safe, effective, and personalized fitness prescriptions for clients with varying health conditions, injuries, and equipment constraints.

Motivation

Fitness and health prescription is a domain where AI errors have real-world consequences. A naive language model given a client with a knee injury and Type 2 diabetes may recommend running programs and simple-carb loading — advice that could cause physical harm. FitScript creates a structured, graded environment that forces agents to reason about:

Safety constraints (contraindicated exercises per medical condition)
Efficacy (will this plan actually achieve the goal, by exercise science standards?)
Personalization (is this tailored, or just a template with the name swapped?)

This fills a real gap: there are no existing OpenEnv environments for healthcare-adjacent prescription tasks with deterministic, rule-based graders.

Environment Description

The agent receives a client profile and must output a complete fitness prescription. The environment grades the response deterministically across three sub-scores.

Reward Function

reward = safety^1.5 × efficacy × personalization^0.8 × completeness^0.5

Sub-score	Weight	What it measures
Safety	40%	Avoids contraindicated exercises, no dangerous dietary advice, recommends clearance
Efficacy	35%	Plan will achieve stated goal by exercise science principles
Personalization	25%	Plan is tailored to client specifics, not a generic template
Completeness	bonus	All required specifics provided (numbers, exercises, sets/reps)

Reward is in [0.0, 1.0]. Partial credit is awarded at the sub-score level, providing dense signal across the episode trajectory.

Action Space

FitscriptAction

Field	Type	Description
`message`	`str`	The agent's complete fitness prescription text

The agent sends a single text message containing the full prescription. No structured format is required — the grader uses pattern matching and keyword detection on the free-text response.

Observation Space

FitscriptObservation

Field	Type	Description
`echoed_message`	`str`	Agent's last prescription (echoed for context)
`task_id`	`int`	Active task: 1=easy, 2=medium, 3=hard
`task_description`	`str`	One-line task objective
`client_scenario`	`str`	Full client profile the agent must address
`feedback`	`str`	Detailed grader feedback after each step
`safety_score`	`float`	Safety sub-score (0.0–1.0)
`efficacy_score`	`float`	Efficacy sub-score (0.0–1.0)
`personalization_score`	`float`	Personalization sub-score (0.0–1.0)
`checks_passed`	`List[str]`	Grader checks passed
`checks_failed`	`List[str]`	Grader checks failed
`step_number`	`int`	Current step within episode
`max_steps`	`int`	Max steps for this task (3 for all tasks)
`done`	`bool`	Episode complete
`reward`	`float`	Reward for last step

Tasks

Task 1 — Basic Safe Prescription (Easy)

Client: Healthy 24-year-old male, 78 kg, 175 cm, sedentary desk job, beginner, full gym access. Goal: fat loss.

What the agent must produce:

Weekly workout split (3–5 days/week, appropriate for beginner)
Caloric deficit (300–500 kcal below TDEE)
Protein target (1.6–2.2 g/kg bodyweight)
4-week progressive overload plan (Week 1 ≠ Week 4)
Rest and recovery days

Key grader checks: No dangerous beginner exercises · Correct caloric deficit · Protein in range · Both cardio and resistance training · Progressive overload strategy

Expected baseline score: 0.60–0.80

Task 2 — Injury Constraints + Chronic Disease (Medium)

Client: 35-year-old female, 68 kg, Type 2 diabetes (HbA1c 7.8%), left knee meniscus tear (partial), night-shift worker (11 PM–7 AM). Goals: weight loss + strength.

What the agent must produce:

Knee-safe exercise alternatives (swimming, cycling — no running, jumping, deep squats)
Blood sugar monitoring guidance integrated into exercise timing
Dietary recommendations appropriate for T2DM (low-GI, carb-aware)
Recovery schedule adapted for night-shift sleep pattern
Physician clearance recommendation

Key grader checks: Zero high-impact exercises · Physician clearance mentioned · Blood sugar monitoring mentioned · Low-GI dietary guidance · Night-shift schedule acknowledged

Why this is hard: Naive models balance fat loss (deficit) against diabetes (blood sugar stability) against knee safety (no impact) poorly — they satisfy 1–2 but miss the third.

Expected baseline score: 0.35–0.55

Task 3 — Multi-Client Resource Allocation (Hard)

Scenario: A coach has 4 clients, one shared home gym (dumbbells ≤20 kg, resistance bands, pull-up bar — NO barbell, NO machines), and only 3 hours/week of coaching time across all clients.

Client	Profile	Goal
A	55yo male, post-cardiac event, cleared for light exercise	Stay active
B	19yo female, competitive marathon runner	Add strength without losing aerobic base
C	42yo male, 102 kg, herniated disc L4-L5	Weight loss
D	28yo female, 4 months postpartum, breastfeeding	Core strength restoration

What the agent must produce:

Individual plan for all 4 clients
Total coaching time ≤ 3 hours/week
Only available equipment used
All medical constraints respected per client

Key grader checks: Cardiac patient not given high intensity · Runner's leg volume controlled · No spinal compression for back pain client · No crunches/heavy lifting for postpartum client · No barbell/machine references · Time budget explicitly managed

Expected baseline score: 0.20–0.45

Setup & Usage

Prerequisites

Docker installed
Python ≥ 3.10
openenv-core >= 0.2.2

Quick Start (Docker)

# Build the image
docker build -t fitscript-env:latest -f server/Dockerfile .

# Run the server
docker run -p 8000:8000 fitscript-env:latest

Quick Start (Local Dev)

pip install openenv-core>=0.2.2 fastapi uvicorn
uvicorn server.app:app --reload --host 0.0.0.0 --port 8000

Using the Python Client

from FitScript import FitscriptAction, FitscriptEnv

# Connect to running server
env = FitscriptEnv(base_url="http://localhost:8000")

# Reset — returns client scenario
result = env.reset()
print(result.observation.client_scenario)

# Step — submit your prescription
result = env.step(FitscriptAction(message="Your prescription here..."))
print(result.observation.feedback)
print(f"Reward: {result.reward:.4f}")
print(f"Safety: {result.observation.safety_score:.3f}")
print(f"Efficacy: {result.observation.efficacy_score:.3f}")
print(f"Personalization: {result.observation.personalization_score:.3f}")

env.close()

Running the Baseline Inference Script

export API_BASE_URL="https://api.openai.com/v1"
export MODEL_NAME="gpt-4o"
export HF_TOKEN="sk-your-api-key"
export ENV_BASE_URL="http://localhost:8000"   # optional, defaults to localhost

python inference.py

The script runs all 3 tasks sequentially and emits [START], [STEP], and [END] JSON logs per the OpenEnv evaluation format.

Deploy to Hugging Face Spaces

openenv push
# or with explicit repo:
openenv push --repo-id your-username/FitScript

Baseline Scores

Expected baseline ranges for a single-pass LLM run under standard inference settings:

Task	Expected baseline
Task 1 (Easy)	0.60–0.80
Task 2 (Medium)	0.35–0.55
Task 3 (Hard)	0.20–0.45
Overall	0.40–0.60

Note: Task 3 scores are intentionally low — a naive single-pass LLM call fails the equipment constraint check (uses barbell), misses time budgeting, and often omits cardiac intensity limits. This is by design.

Project Structure

FitScript/
├── __init__.py               # Module exports (FitscriptAction, FitscriptObservation, FitscriptEnv)
├── models.py                 # Pydantic Action + Observation models
├── client.py                 # FitscriptEnv HTTP/WebSocket client
├── inference.py              # Baseline inference script (root level, required)
├── openenv.yaml              # OpenEnv manifest with task metadata
├── pyproject.toml            # Project metadata and dependencies
├── README.md                 # This file
└── server/
    ├── __init__.py           # Server module exports
    ├── FitScript_environment.py  # Core environment logic + 3 graders
    ├── app.py                # FastAPI application (HTTP + WebSocket)
    └── Dockerfile            # Container image definition

Grader Design Notes

All graders are fully deterministic — no LLM-as-judge. They use:

Regex pattern matching for numeric values (calories, protein grams, frequencies)
Hardcoded keyword lists per medical condition (e.g., KNEE_CONTRAINDICATED, SPINAL_COMPRESSION)
Section parsing (e.g., extracting the "Client A" section of a multi-client plan)

This ensures reproducible scores across runs and prevents grader gaming through prompt injection.

Endpoints

Endpoint	Method	Description
`/reset`	POST	Start a new episode, receive client scenario
`/step`	POST	Submit prescription, receive graded feedback
`/state`	GET	Get current episode state
`/health`	GET	Health check
`/docs`	GET	OpenAPI/Swagger documentation
`/web`	GET	Interactive web UI
`/ws`	WebSocket	Persistent session endpoint

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FitScript — Fitness Prescription Agent Environment

Motivation

Environment Description

Reward Function

Action Space

Observation Space

Tasks

Task 1 — Basic Safe Prescription (Easy)

Task 2 — Injury Constraints + Chronic Disease (Medium)

Task 3 — Multi-Client Resource Allocation (Hard)

Setup & Usage

Prerequisites

Quick Start (Docker)

Quick Start (Local Dev)

Using the Python Client

Running the Baseline Inference Script

Deploy to Hugging Face Spaces

Baseline Scores

Project Structure

Grader Design Notes

Endpoints

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
openenv_FitScript.egg-info		openenv_FitScript.egg-info
server		server
README.md		README.md
__init__.py		__init__.py
client.py		client.py
inference.py		inference.py
models.py		models.py
openenv.yaml		openenv.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

FitScript — Fitness Prescription Agent Environment

Motivation

Environment Description

Reward Function

Action Space

Observation Space

Tasks

Task 1 — Basic Safe Prescription (Easy)

Task 2 — Injury Constraints + Chronic Disease (Medium)

Task 3 — Multi-Client Resource Allocation (Hard)

Setup & Usage

Prerequisites

Quick Start (Docker)

Quick Start (Local Dev)

Using the Python Client

Running the Baseline Inference Script

Deploy to Hugging Face Spaces

Baseline Scores

Project Structure

Grader Design Notes

Endpoints

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages