Ember follows three core principles:
- Simple by Default - Zero configuration, direct function calls, no boilerplate
- Progressive Disclosure - Advanced features available when needed, hidden when not
- 10x Performance - Automatic optimization without manual tuning
┌────────────────────────────────────────────────────────────┐
│ PUBLIC API │
├─────────────────┬─────────────────┬────────────────────────┤
│ models() │ operators() │ data.stream() │
│ Direct LLM API │ Composable AI │ Streaming Data │
└────────┬────────┴────────┬────────┴───────┬────────────────┘
│ │ │
┌────────▼────────┬────────▼────────┬───────▼───────────────┐
│ Model Registry │ Operator System │ Data Pipeline │
│ │ │ │
│ • Provider │ • Composition │ • Loaders │
│ Resolution │ • Validation │ • Transformers │
│ • Cost Tracking│ • JAX Pytrees │ • Samplers │
└────────┬────────┴────────┬────────┴───────┬───────────────┘
│ │ │
┌────────▼─────────────────▼────────────────▼─────────────────┐
│ XCS ENGINE │
│ │
│ • Automatic JIT Compilation │
│ • Parallelism Detection │
│ • Execution Optimization │
└─────────────────────────────────────────────────────────────┘
The CLI provides comprehensive management and introspection capabilities:
# Interactive setup with provider selection
ember setup # Launches React-based wizard
# Configuration management
ember configure get KEY # Get config value
ember configure set KEY VALUE # Set config value
ember configure list # Show all config
ember configure show SECTION # Show section
# Context introspection
ember context view # View resolved configuration
ember context view --format json # Output as JSON
ember context view --filter models # Filter to specific path
ember context validate # Validate configuration
# Registry introspection
ember registry list-models # List available models
ember registry list-models --provider openai # Filter by provider
ember registry list-models --verbose # Detailed information
ember registry list-providers # Show provider status
ember registry info gpt-4 # Detailed model info
# Testing and discovery
ember test [--model MODEL] # Test connectionThe setup wizard (@ember-ai/setup) features:
- React/Ink-based interactive UI
- Provider-specific API key validation
- Automatic configuration file creation
- Integration with EmberContext for credential storage
Centralized configuration and credential management with thread-safe and async-safe isolation.
The context system provides a simplified, unified API following the principle of "one obvious way":
from ember import context
# Primary API - simple and clear
ctx = context.get() # Get current context (creates if needed)
# Context manager for temporary overrides
with context.manager(models={"default": "gpt-4", "temperature": 0.9}) as ctx:
# All operations in this block use these settings
response = models("Hello") # Uses gpt-4 with temperature 0.9
# Original context automatically restored
# Direct configuration access
from ember.context import get_config, set_config
default_model = get_config("models.default", "gpt-3.5-turbo")
set_config("models.temperature", 0.7)Key Features:
- Thread-safe and async-safe: Context propagates correctly across boundaries
- Hierarchical configuration: Child contexts inherit from parents
- Multiple credential sources: Runtime > Environment > Config file > Defaults
- Clean scoping: Context managers ensure proper cleanup
Configuration Priority:
- Runtime context (highest)
- Environment variables
- ~/.ember/config.yaml
- Internal defaults (lowest)
Async Context Propagation:
The context system uses Python's contextvars for proper async isolation:
import asyncio
from ember.context import get_context, create_context
async def process_with_model(text: str) -> str:
# Context automatically propagates across async boundaries
ctx = get_context() # Gets the correct context
model = ctx.get_config("models.default")
print(f"Task {asyncio.current_task().get_name()}: Using {model}")
# Context remains isolated even with concurrent execution
await asyncio.sleep(0.1)
return f"Processed '{text}' with {model}"
async def main():
# Each task gets its own isolated context
async def task_with_context(name: str, model: str, text: str):
with create_context(models={"default": model}):
return await process_with_model(text)
# Run tasks concurrently with different contexts
results = await asyncio.gather(
task_with_context("task1", "gpt-4", "Hello"),
task_with_context("task2", "claude-3-opus", "World"),
task_with_context("task3", "gemini-1.5-pro-latest", "Async"),
)
for result in results:
print(result)
# Output:
# Task task-1: Using gpt-4
# Task task-2: Using claude-3-opus
# Task task-3: Using gemini-1.5-pro-latest
# Processed 'Hello' with gpt-4
# Processed 'World' with claude-3-opus
# Processed 'Async' with gemini-1.5-pro-latestThread-Safe Context Usage:
The context system uses thread-local storage with proper locking:
import threading
from ember.context import get_context, create_context
def worker(name: str, model: str):
# Each thread gets isolated context
with create_context(models={"default": model}):
ctx = get_context()
print(f"{name}: Using {ctx.get_config('models.default')}")
# Do work with thread-local model
# Launch workers with different models
threads = [
threading.Thread(target=worker, args=("Worker1", "gpt-4")),
threading.Thread(target=worker, args=("Worker2", "claude-3")),
]
for t in threads:
t.start()
for t in threads:
t.join()Configuration Management:
from ember.context import get_context, with_context
from ember.api import models
# Get and modify configuration
ctx = get_context()
ctx.set_config("models.temperature", 0.7)
ctx.save() # Persist to disk
# Temporary configuration overrides
with with_context(models={"temperature": 0.9, "max_tokens": 2000}):
# This block uses high temperature and more tokens
response = models("gpt-4", "Write a creative story")
# Back to original settings
response = models("gpt-4", "Summarize this document") # Uses temperature=0.7The context system provides secure, hierarchical credential management:
from ember import context
ctx = context.get()
# Credential lookup hierarchy (first found wins):
# 1. Runtime context credentials
# 2. Environment variables (OPENAI_API_KEY, etc.)
# 3. Config file (~/.ember/config.yaml)
# 4. Separate credentials file (~/.ember/credentials.yaml)
# Get credential with fallback chain
api_key = ctx.get_credential("openai", "OPENAI_API_KEY")
# Set credentials at runtime (not persisted)
with context.manager(credentials={"openai_api_key": "sk-temp-key"}):
# This block uses the temporary key
response = models("gpt-4", "Hello")
# Persist credentials securely
ctx.set_config("credentials.openai_api_key", "sk-prod-key")
ctx.save() # Saves to ~/.ember/config.yaml with proper permissionsSecurity Features:
- Credentials never logged or displayed in errors
- Config files created with 0600 permissions (user-only read/write)
- Support for credential rotation without code changes
- Clear error messages that don't expose sensitive data
Direct LLM invocation without client initialization:
# Simple case - no setup required
response = models("gpt-4", "Hello world")
# Advanced case - reusable configuration
gpt4 = models.instance("gpt-4", temperature=0.7)Design Decisions:
- Registry pattern for provider management
- Lazy initialization for fast startup
- Integrated cost and usage tracking
- Thread-safe singleton implementation
Composable building blocks for AI applications:
# Function decorator approach
@operators.op
def summarize(text: str) -> str:
return models("gpt-4", f"Summarize: {text}")
# Class-based with validation
class ValidatedOp(Operator):
input_spec = InputModel
output_spec = OutputModelDesign Decisions:
- JAX pytree compatibility for automatic differentiation
- Optional but powerful validation system
- Composition over inheritance
- Zero overhead when validation not used
Streaming-first data loading with progressive enhancement:
# Basic streaming
for item in stream("dataset"):
process(item)
# Chained transformations
stream("dataset").filter(valid).transform(clean).batch(32)Design Decisions:
- Memory-efficient streaming by default
- Explicit materialization when needed
- Protocol-based extensibility
- Built-in caching layer
Zero-configuration optimization:
@xcs.jit
def complex_workflow(data):
# Automatically optimized
return pipeline(data)
# Automatic parallelization
results = xcs.vmap(process)(batch)Design Decisions:
- Tracing-based optimization
- Automatic strategy selection
- JAX backend for numerical operations
- Orchestration for I/O-bound tasks
Single source of truth for each resource type:
ModelRegistry- LLM provider managementDataRegistry- Dataset loader management- Thread-safe with proven single-lock approach
JAX-compatible base class for all operators:
- Automatic parameter detection
- Pytree registration
- Clean composition semantics
Three levels of API complexity:
- Simple Functions - Direct calls for basic use
- Decorators - Enhancement without boilerplate
- Classes - Full control when needed
Optional but powerful when used:
- Pydantic models for validation
- Type hints guide system behavior
- Runtime validation from static types
- Function decorated with
@jit - First call traces execution
- IR built from trace
- Optimal backend selected:
- JAX for numerical operations
- Async orchestration for I/O
- Subsequent calls use compiled version
- Automatic detection of map operations
- Data dependency analysis
- Optimal chunking for throughput
- Zero configuration required
- Streaming by default for data
- Lazy evaluation where possible
- Explicit materialization points
- Automatic garbage collection hints
- Implement
BaseProviderinterface - Register with
ModelRegistry - No core changes required
- Inherit from
Operatorbase - Define
call()method - Optional validation specs
- Automatic JAX integration
- Implement loader protocol
- Register with
DataRegistry - Streaming support automatic
- Isolated component testing
- Minimal test doubles
- Type testing utilities
- Cross-module interactions
- Real provider testing
- Performance benchmarks
- Helper modules for common setups
- Simplified imports for isolation
- Comprehensive coverage tracking
- Environment variable loading
- No keys in code
- Secure defaults
- Optional but recommended
- Pydantic integration
- Type-safe boundaries
- Provider-level handling
- Automatic retry logic
- Exponential backoff
- Distributed Execution - Multi-node XCS
- Model Quantization - Automatic optimization
- Streaming Inference - Token-level streaming
- Edge Deployment - Browser/mobile runtime
- Simple API remains simple
- Advanced features stay optional
- Performance improvements automatic
- Backward compatibility preserved
Decision: Use direct function calls instead of dependency injection Rationale: Eliminates boilerplate, improves discoverability Consequences: Simpler API, easier testing, less flexibility
Decision: Single registry per resource type Rationale: Clear ownership, thread-safe, extensible Consequences: Centralized management, potential bottleneck
Decision: Base operators on JAX pytrees Rationale: Automatic differentiation, JIT compilation Consequences: Power user features, slight complexity
Decision: Default to streaming for data operations Rationale: Memory efficiency, scalability Consequences: Explicit materialization needed sometimes
This architecture embodies the principles of simplicity, performance, and extensibility that guide Ember's development.