Enhancement: Type System and Agent Schema Design

# VAT Type System and Agent Schema Design

## Overview

This design defines a type system for data contracts and an agent schema for compute units in the Vibe Agent Toolkit. The key principles are:

1. **Separation of Concerns**: Types (data contracts) are separate from agents (compute units)
2. **Postel's Law**: Agents are liberal in what they accept (multiple input types) and conservative in what they send (single output type)
3. **Type Reusability**: Types are shared across agents, enabling composition
4. **Orchestration Agnostic**: Same agent runs on filesystem, memory, HTTP, etc.

---

## Type System

### Data Type Definition

A **Data Type** defines what data represents semantically, independent of which agent produces or consumes it.

```typescript
// @vibe-agent-toolkit/agent-schema

/**
 * Data Type - defines semantic content type
 */
export interface DataType {
  /**
   * Unique type identifier (semantic type ID)
   * Convention: category/name
   * Examples: 'document/requirements', 'data/user-collection', 'meta/validation-report'
   */
  id: string;

  /** Human-readable name */
  name: string;

  /** Description of what this type represents */
  description: string;

  /** Physical format constraints */
  format: {
    /** MIME type (physical format) */
    mimeType: string;

    /** Schema reference (for structured types) */
    schema?: SchemaReference;
  };

  /** Tags for categorization and discovery */
  tags?: string[];

  // No version field - add later when versioning strategy is determined
}

/**
 * Reference to a schema resource
 *
 * Note: Schemas are resources, not inline objects.
 * Use resource URI to reference schema files.
 */
export interface SchemaReference {
  /** Schema resource URI (relative or absolute path) */
  schemaUri: string;
}
```

### Zod Schema

```typescript
export const SchemaReferenceSchema = z.object({
  schemaUri: z.string(),
});

export const DataTypeSchema = z.object({
  id: z.string(),
  name: z.string(),
  description: z.string(),
  format: z.object({
    mimeType: z.string(),
    schemaUri: z.string().optional(),
  }),
  tags: z.array(z.string()).optional(),
  // No version field
});
```

### VAT Resource Types

Resources in VAT have semantic types that indicate their purpose, separate from their physical MIME type.

```typescript
/**
 * VAT Resource types - semantic purpose of resources
 */
export type VATResourceType =
  // Schemas
  | 'vat:schema'

  // Prompts
  | 'vat:prompt/system'      // System prompts for agent behavior
  | 'vat:prompt/user'        // User prompt templates
  | 'vat:prompt/tool'        // Tool use prompts

  // Knowledge & Context
  | 'vat:knowledge/reference'  // Reference documentation
  | 'vat:knowledge/context'    // Background context for agents
  | 'vat:knowledge/faq'        // FAQ knowledge base

  // Examples & Tests
  | 'vat:example/input'      // Example inputs
  | 'vat:example/output'     // Example outputs
  | 'vat:test-case'          // Test case definitions
  | 'vat:test-fixture'       // Test fixture data

  // Configuration
  | 'vat:config/agent'       // Agent configuration (agent.yaml)
  | 'vat:config/type'        // Type definitions
  | 'vat:config/pipeline'    // Pipeline configurations

  // Documentation
  | 'vat:guide/user'         // User-facing guides
  | 'vat:guide/developer'    // Developer documentation
  | 'vat:guide/architecture' // Architecture docs

  // Data
  | 'vat:data/structured'    // Structured data files
  | 'vat:data/unstructured'  // Unstructured data
  | 'vat:data/embedding';    // Embedding vectors

/**
 * Resource - any artifact used by agents
 */
export interface Resource {
  /** Resource URI (relative or absolute) */
  uri: string;

  /** Semantic resource type (purpose) */
  resourceType: VATResourceType;

  /** Physical format (IANA MIME type) */
  mimeType: string;

  /** Resource content */
  content: string | Buffer;

  /** Resource metadata */
  metadata: ResourceMetadata;
}

/**
 * Resource metadata
 */
export interface ResourceMetadata {
  title?: string;
  description?: string;
  tags?: string[];

  /**
   * Checksum in Subresource Integrity (SRI) format
   * Format: sha256-base64hash
   * Example: "sha256-47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU="
   */
  checksum?: string;

  /** Additional metadata */
  [key: string]: unknown;
}

/**
 * Default MIME types for resource types
 */
const DEFAULT_MIME_TYPES: Record<VATResourceType, string> = {
  'vat:schema': 'application/schema+json',
  'vat:prompt/system': 'text/markdown',
  'vat:prompt/user': 'text/markdown',
  'vat:prompt/tool': 'text/markdown',
  'vat:knowledge/reference': 'text/markdown',
  'vat:knowledge/context': 'text/markdown',
  'vat:knowledge/faq': 'text/markdown',
  'vat:example/input': 'application/json',
  'vat:example/output': 'application/json',
  'vat:test-case': 'application/json',
  'vat:test-fixture': 'application/json',
  'vat:config/agent': 'application/x-yaml',
  'vat:config/type': 'application/json',
  'vat:config/pipeline': 'application/x-yaml',
  'vat:guide/user': 'text/markdown',
  'vat:guide/developer': 'text/markdown',
  'vat:guide/architecture': 'text/markdown',
  'vat:data/structured': 'application/json',
  'vat:data/unstructured': 'text/plain',
  'vat:data/embedding': 'application/octet-stream',
};
```

### Registries

Registries are **instantiated per context** (not global singletons) for testability and isolation.

```typescript
/**
 * Registry of known data types
 */
export class TypeRegistry {
  private types = new Map<string, DataType>();

  register(type: DataType): void;
  get(id: string): DataType | undefined;
  has(id: string): boolean;
  list(filter?: { tags?: string[]; mimeType?: string }): DataType[];
  findByTag(tag: string): DataType[];
}

/**
 * Registry of known agents
 */
export class AgentRegistry {
  private agents = new Map<string, AgentConfig>();

  register(config: AgentConfig): void;
  get(id: string): AgentConfig | undefined;
  has(id: string): boolean;
  list(filter?: { tags?: string[]; inputType?: string; outputType?: string }): AgentConfig[];
  findConsumers(typeId: string): AgentConfig[];
  findProducers(typeId: string): AgentConfig[];
  findMetaAgents(category?: string): AgentConfig[];
}

/**
 * Registry of known resources
 */
export class ResourceRegistry {
  private resources = new Map<string, Resource>();

  register(resource: Resource): void;
  get(uri: string): Resource | undefined;
  has(uri: string): boolean;
  list(filter?: { resourceType?: VATResourceType; tags?: string[] }): Resource[];
  findByType(resourceType: VATResourceType): Resource[];
}

// Usage: Create instances (not singletons)
const typeRegistry = new TypeRegistry();
const agentRegistry = new AgentRegistry();
const resourceRegistry = new ResourceRegistry();
```

### Schema URI Resolution

Schema references use URIs that are resolved using two protocols:

**Package-relative** (default):
```typescript
schemaUri: 'schemas/User.schema.json'
// Resolved relative to current package root
// Example: <package-root>/schemas/User.schema.json
```

**npm protocol** (external packages):
```typescript
schemaUri: 'npm:@acme/schemas/Product.schema.json'
// Uses Node module resolution (respects .npmrc)
// Supports scoped packages (@acme/*) and private registries
// Registry routing configured via .npmrc
```

**Resolution process:**
1. If URI starts with `npm:`, strip prefix and use Node's module resolution
2. Node reads `.npmrc` for scoped package routing (`@scope:registry=...`)
3. Auth tokens handled automatically from `.npmrc`
4. Otherwise, resolve relative to package root

**Example .npmrc support:**
```
@acme:registry=https://npm.pkg.github.com
//npm.pkg.github.com/:_authToken=ghp_***
registry=https://registry.npmjs.org/
```

With this config:
- `npm:@acme/schemas/Product.schema.json` → GitHub Packages (private)
- `npm:@vibe-agent-toolkit/schemas/User.schema.json` → public npm

### Example Type Definitions

```typescript
// Registries shown as variables for clarity (instantiated elsewhere)

// Unstructured document (no schema)
typeRegistry.register({
  id: 'document/requirements',
  name: 'Requirements Document',
  description: 'Textual requirements document',
  format: {
    mimeType: 'text/markdown',
  },
  tags: ['document', 'unstructured', 'text'],
});

// Structured data (references schema resource)
typeRegistry.register({
  id: 'data/user-collection',
  name: 'User Collection',
  description: 'Structured collection of user records',
  format: {
    mimeType: 'application/json',
    schemaUri: 'schemas/UserCollection.schema.json',  // Link to schema resource
  },
  tags: ['structured', 'data', 'users'],
});

// Meta-type (output of meta-agents)
typeRegistry.register({
  id: 'meta/validation-report',
  name: 'Validation Report',
  description: 'Results from validation meta-agent',
  format: {
    mimeType: 'application/json',
    schemaUri: 'schemas/ValidationReport.schema.json',
  },
  tags: ['meta', 'validation', 'report'],
});
```

---

## Agent Schema

### Core Definitions

An **Agent** is a compute unit that transforms inputs to outputs. Agents reference types by ID.

```typescript
// @vibe-agent-toolkit/agent-schema

/**
 * Input Slot - liberal in what it accepts
 */
export interface InputSlot {
  /**
   * Variable name in agent code
   * Example: 'sourceDocument', 'userList', 'configFile'
   */
  name: string;

  /**
   * Accepted type IDs (ordered by preference)
   * Agent can handle any of these types
   * Orchestrator will auto-convert if needed
   */
  types: string[];

  /** Description (agent-specific context) */
  description?: string;

  /** Whether this input is required */
  required?: boolean;
}

/**
 * Output Slot - conservative in what it sends
 */
export interface OutputSlot {
  /**
   * Variable name in agent code
   * Example: 'processedData', 'report', 'summary'
   */
  name: string;

  /**
   * Single type ID produced (deterministic)
   * Agent ALWAYS produces this exact type
   */
  type: string;

  /** Description (agent-specific context) */
  description?: string;

  /** Whether this output is always produced */
  required?: boolean;
}

/**
 * Agent Configuration
 */
export interface AgentConfig {
  /** Unique agent identifier */
  id: string;

  /** Human-readable name */
  name: string;

  /** Description */
  description?: string;

  /** Version (semantic versioning) */
  version?: string;

  /** Input slots */
  inputs?: InputSlot[];

  /** Output slots */
  outputs?: OutputSlot[];

  /** Tags for categorization */
  tags?: string[];
}
```

### Zod Schemas

```typescript
export const InputSlotSchema = z.object({
  name: z.string(),
  types: z.array(z.string()).min(1),
  description: z.string().optional(),
  required: z.boolean().default(true),
});

export const OutputSlotSchema = z.object({
  name: z.string(),
  type: z.string(),
  description: z.string().optional(),
  required: z.boolean().default(true),
});

export const AgentConfigSchema = z.object({
  id: z.string(),
  name: z.string(),
  description: z.string().optional(),
  version: z.string().optional(),
  inputs: z.array(InputSlotSchema).optional(),
  outputs: z.array(OutputSlotSchema).optional(),
  tags: z.array(z.string()).optional(),
});
```

### Example Agent Definitions

```typescript
// Registries shown as variables for clarity (instantiated elsewhere)

// Regular agent - transforms data
agentRegistry.register({
  id: 'text-extractor',
  name: 'Text Extraction Agent',
  description: 'Extracts text from documents',
  version: '1.0.0',
  inputs: [
    {
      name: 'sourceDocument',
      types: [
        'document/pdf',           // Preferred
        'document/docx',          // Accepted
        'document/plain-text'     // Fallback
      ],
      required: true
    }
  ],
  outputs: [
    {
      name: 'extractedText',
      type: 'document/plain-text',  // Always produces plain text
      required: true
    }
  ],
  tags: ['extraction', 'document-processing']
});

// Meta-agent - validates data
agentRegistry.register({
  id: 'schema-validator',
  name: 'Schema Validation Agent',
  version: '1.0.0',
  inputs: [
    {
      name: 'data',
      types: ['*'],  // Accepts any structured type
      required: true
    }
  ],
  outputs: [
    {
      name: 'validationReport',
      type: 'meta/validation-report',
      required: true
    },
    {
      name: 'validatedData',
      type: '*',  // Pass-through (same type as input)
      required: true
    }
  ],
  tags: ['meta-agent', 'validation', 'quality']
});

// Meta-agent - converts formats
agentRegistry.register({
  id: 'json-to-yaml-converter',
  name: 'JSON to YAML Converter',
  version: '1.0.0',
  inputs: [
    {
      name: 'data',
      types: ['*'],  // Any type with application/json
      required: true
    }
  ],
  outputs: [
    {
      name: 'converted',
      type: '*',  // Same semantic type, different format
      required: true
    }
  ],
  tags: ['meta-agent', 'converter', 'format']
});
```

---

## Runtime Components

### Agent Data (Runtime Instance)

```typescript
// @vibe-agent-toolkit/agent-runtime

/**
 * Data instance passed between agents at runtime
 * Includes content, type info, and metadata
 */
export interface AgentData<T = unknown> {
  /** The actual data content */
  content: T;

  /** Type information */
  type: {
    /** Type ID from registry */
    id: string;

    /** Physical format (MIME type) */
    mimeType: string;
  };

  /**
   * Quality metadata (optional)
   * Attached by meta-agents that evaluate quality
   */
  quality?: QualityMetadata;

  /**
   * Provenance metadata (optional)
   * Added by orchestrator/runtime wrapper
   */
  provenance?: ProvenanceMetadata;

  /** Additional metadata */
  metadata?: Record<string, unknown>;
}
```

### Quality Metadata

```typescript
/**
 * Quality metadata - attached to data instances
 * Produced by quality evaluation meta-agents
 */
export interface QualityMetadata {
  /** Confidence score (0.0-1.0) */
  confidence?: number;

  /** Completeness score (0.0-1.0) */
  completeness?: number;

  /** Validation results */
  validation?: {
    /** Schema validation passed */
    schemaValid: boolean;

    /** Business rule validation passed */
    rulesValid: boolean;

    /** Validation issues */
    issues?: ValidationIssue[];
  };

  /** Quality assessment from judge agent */
  assessment?: {
    /** Overall quality score */
    score: number;

    /** Judge agent ID */
    judgeId: string;

    /** Judge agent version */
    judgeVersion: string;

    /** Timestamp */
    timestamp: string;

    /** Detailed quality dimensions */
    dimensions?: Record<string, number>;

    /** Human-readable feedback */
    feedback?: string;
  };

  /** Custom quality dimensions */
  dimensions?: Record<string, number>;

  /** Statistics about the data */
  statistics?: Record<string, number | string>;
}

export interface ValidationIssue {
  severity: 'error' | 'warning' | 'info';
  path: string;  // JSON path to problematic field
  message: string;
  code?: string;
}
```

### Provenance Metadata

```typescript
/**
 * Provenance metadata - tracks data lineage
 * Added automatically by orchestrator
 */
export interface ProvenanceMetadata {
  /** Producer information */
  producer: {
    agentId: string;
    instanceId?: string;
    version: string;
    timestamp: string;
  };

  /** Source data lineage */
  sources?: Array<{
    /** Input slot name */
    inputSlot: string;

    /** Type of source data */
    type: string;

    /** Provenance chain from source */
    provenance?: ProvenanceMetadata;
  }>;

  /** Transformations applied */
  transformations?: Array<{
    agentId: string;
    operation: string;
    timestamp: string;
  }>;

  /** Quality evolution through pipeline */
  qualityChain?: Array<{
    agentId: string;
    inputQuality?: QualityMetadata;
    outputQuality?: QualityMetadata;
    degradation?: number;
  }>;
}
```

### Agent Context (Execution API)

```typescript
/**
 * Execution context provided to agents
 * Abstracts I/O from orchestration mechanism
 */
export interface AgentContext {
  /** Agent ID */
  agentId: string;

  /** Optional instance ID (for multi-instance agents) */
  instanceId?: string;

  /**
   * Read input by slot name
   * Returns data with type info and metadata
   * Orchestrator handles format conversion if needed
   */
  readInput(slotName: string): Promise<AgentData>;

  /**
   * Write output by slot name
   *
   * @param slotName - Output slot name from agent metadata
   * @param content - The actual data to write
   * @param runtimeMetadata - Optional metadata to attach
   */
  writeOutput(
    slotName: string,
    content: unknown,
    runtimeMetadata?: {
      quality?: QualityMetadata;
      metadata?: Record<string, unknown>;
      provenance?: Partial<ProvenanceMetadata>;
    }
  ): Promise<void>;
}

/**
 * Base agent interface
 * Agents implement this to be executable
 */
export interface IAgent {
  execute(context: AgentContext): Promise<void>;
}
```

---

## Orchestration Concepts

### Orchestrator Interface

```typescript
// @vibe-agent-toolkit/orchestration

/**
 * Orchestrator - manages agent execution
 * Different implementations: filesystem, memory, HTTP, etc.
 */
export interface AgentOrchestrator {
  /**
   * Register an agent for execution
   */
  registerAgent(config: AgentConfig, implementation: IAgent): void;

  /**
   * Execute an agent
   */
  executeAgent(agentId: string, instanceId?: string): Promise<void>;

  /**
   * Get output from previous agent
   */
  getOutput(agentId: string, slotName: string, instanceId?: string): Promise<AgentData>;
}
```

### Filesystem Orchestrator Example

```typescript
/**
 * Filesystem-based orchestrator
 * Reads inputs from files, writes outputs to files
 */
export class FilesystemOrchestrator implements AgentOrchestrator {
  constructor(
    private baseDir: string,
    private formatConverter: FormatConverter
  ) {}

  async readInput(agentId: string, slotName: string, slot: InputSlot): Promise<AgentData> {
    // Find file matching any accepted MIME type
    const file = await this.findInputFile(slotName, slot.types);

    if (!file) {
      throw new Error(`No input file found for ${slotName}`);
    }

    const content = await fs.readFile(file.path);
    const typeInfo = typeRegistry.get(file.typeId)!;

    // Auto-convert to preferred format if needed
    const preferredType = typeRegistry.get(slot.types[0]!)!;
    if (typeInfo.format.mimeType !== preferredType.format.mimeType &&
        this.formatConverter.canConvert(typeInfo.format.mimeType, preferredType.format.mimeType)) {

      const converted = await this.formatConverter.convert(
        content,
        typeInfo.format.mimeType,
        preferredType.format.mimeType
      );

      return {
        content: converted,
        type: { id: slot.types[0]!, mimeType: preferredType.format.mimeType },
        metadata: { originalMimeType: typeInfo.format.mimeType, converted: true }
      };
    }

    return {
      content,
      type: { id: file.typeId, mimeType: typeInfo.format.mimeType }
    };
  }

  async writeOutput(
    agentId: string,
    slotName: string,
    slot: OutputSlot,
    content: unknown,
    metadata?: any
  ): Promise<void> {
    const typeInfo = typeRegistry.get(slot.type)!;
    const extension = this.mimeTypeToExtension(typeInfo.format.mimeType);
    const outputPath = `${this.baseDir}/output/${agentId}/${slotName}.${extension}`;

    await fs.ensureDir(path.dirname(outputPath));
    await fs.writeFile(outputPath, content as Buffer | string);

    // Add provenance automatically
    const provenance: ProvenanceMetadata = {
      producer: {
        agentId,
        version: '1.0.0',
        timestamp: new Date().toISOString()
      }
    };

    if (metadata) {
      const metadataPath = `${outputPath}.meta.json`;
      await fs.writeFile(metadataPath, JSON.stringify({
        ...metadata,
        provenance
      }, null, 2));
    }
  }

  private mimeTypeToExtension(mimeType: string): string {
    const map: Record<string, string> = {
      'application/json': 'json',
      'application/x-yaml': 'yaml',
      'text/yaml': 'yaml',
      'text/markdown': 'md',
      'text/plain': 'txt',
      'text/csv': 'csv',
      'application/pdf': 'pdf',
    };
    return map[mimeType] || 'bin';
  }
}
```

### In-Memory Orchestrator Example

```typescript
/**
 * In-memory orchestrator
 * Keeps all data in memory (useful for testing)
 */
export class InMemoryOrchestrator implements AgentOrchestrator {
  private outputs = new Map<string, AgentData>();

  async readInput(agentId: string, slotName: string, slot: InputSlot): Promise<AgentData> {
    const key = `${agentId}:${slotName}`;
    const data = this.outputs.get(key);

    if (!data) {
      throw new Error(`Input ${slotName} not found for agent ${agentId}`);
    }

    return data;
  }

  async writeOutput(
    agentId: string,
    slotName: string,
    slot: OutputSlot,
    content: unknown,
    metadata?: any
  ): Promise<void> {
    const key = `${agentId}:${slotName}`;
    const typeInfo = typeRegistry.get(slot.type)!;

    const data: AgentData = {
      content,
      type: { id: slot.type, mimeType: typeInfo.format.mimeType },
      provenance: {
        producer: {
          agentId,
          version: '1.0.0',
          timestamp: new Date().toISOString()
        }
      },
      ...metadata
    };

    this.outputs.set(key, data);
  }

  getOutput(agentId: string, slotName: string): AgentData | undefined {
    return this.outputs.get(`${agentId}:${slotName}`);
  }
}
```

---

## Schema Generation and Checksumming

### Zod-First Development Workflow

Agent authors use **Zod for type safety** during development, then generate JSON Schema at build time for runtime validation and external interoperability.

#### Step 1: Define Zod Schema with VAT Metadata

```typescript
// packages/my-agents/src/schemas/user.ts
import { z } from 'zod';
import { withVATResource, type VATResourceMeta } from '@vibe-agent-toolkit/agent-utils';

export const UserSchema = withVATResource(
  z.object({
    name: z.string().describe('User full name'),
    email: z.string().email().describe('User email address'),
    age: z.number().int().min(0).optional().describe('User age'),
  }),
  {
    resourceType: 'vat:schema',
    tags: ['user', 'auth'],
  }
);

export type User = z.infer<typeof UserSchema>;

// Alternative: Inline with type checking
export const UserSchema2 = z.object({
  name: z.string(),
  email: z.string().email(),
}).meta({
  'x-vat-resource': {
    resourceType: 'vat:schema',
    tags: ['user'],
  } satisfies VATResourceMeta,
});
```

#### Step 2: Generate JSON Schema with Checksum

```typescript
// At build time (automated by @vibe-agent-toolkit/agent-utils)
import { generateSchemaWithChecksum } from '@vibe-agent-toolkit/agent-utils';
import { UserSchema } from './schemas/user.ts';

await generateSchemaWithChecksum(
  UserSchema,
  './src/schemas/generated/User.schema.json',
  { name: 'User' }
);
```

**Generated JSON Schema** (committed to git):
```json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "name": { "type": "string", "description": "User full name" },
    "email": { "type": "string", "format": "email", "description": "User email address" },
    "age": { "type": "integer", "minimum": 0, "description": "User age" }
  },
  "required": ["name", "email"],
  "x-vat-resource": {
    "resourceType": "vat:schema",
    "tags": ["user", "auth"],
    "checksum": "sha256-47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU="
  }
}
```

### RFC 8785 + Subresource Integrity (SRI)

Checksums use two standards:
1. **RFC 8785 (JCS)** for deterministic JSON serialization
2. **Subresource Integrity (SRI)** for checksum format

#### Canonicalization (RFC 8785)

- **Canonical Form**: Lexicographic key ordering, deterministic number formatting
- **Standard**: IETF RFC 8785 - JSON Canonicalization Scheme
- Ensures same schema always produces same byte sequence

#### Checksum Format (SRI)

- **Format**: `sha256-base64hash`
- **Standard**: W3C Subresource Integrity
- **Algorithm**: SHA-256 (same as Docker, Git)
- **Encoding**: Base64 (SRI standard, 33% shorter than hex)

```typescript
// @vibe-agent-toolkit/agent-utils
import canonicalize from 'json-canonicalize';
import * as crypto from 'node:crypto';

/**
 * Compute SRI-format checksum using SHA-256
 * Format: sha256-base64hash
 * Same algorithm as Docker/Git, SRI encoding for web standards
 */
export function computeSchemaChecksum(schema: object): string {
  // Remove existing checksum before computing
  const cleaned = removeChecksum(schema);

  // RFC 8785 canonical form
  const canonical = canonicalize(cleaned);

  // SHA-256 hash with base64 encoding (SRI format)
  const hash = crypto.createHash('sha256')
    .update(canonical, 'utf-8')
    .digest('base64');

  return `sha256-${hash}`;
}

function removeChecksum(obj: any): any {
  if (obj && typeof obj === 'object') {
    if (obj['x-vat-resource']) {
      const { checksum, ...rest } = obj['x-vat-resource'];
      obj['x-vat-resource'] = rest;
    }
    for (const key in obj) {
      obj[key] = removeChecksum(obj[key]);
    }
  }
  return obj;
}
```

### Schema Manifest

Build process generates a manifest tracking all schemas:

```json
{
  "$schema": "https://vibe-agent-toolkit.dev/schema-manifest.schema.json",
  "generatedAt": "2026-01-13T10:30:00Z",
  "generator": "@vibe-agent-toolkit/agent-utils@0.2.0",
  "schemas": {
    "User": {
      "source": "src/schemas/user.ts",
      "output": "src/schemas/generated/User.schema.json",
      "checksum": "sha256-47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=",
      "generatedAt": "2026-01-13T10:30:00Z"
    }
  }
}
```

### Schema Resource Registration

Schemas are registered as resources in the resource registry:

```typescript
// @vibe-agent-toolkit/agent-utils

/**
 * Load JSON Schema as a resource
 */
export function loadSchemaResource(uri: string): Resource {
  const content = fs.readFileSync(uri, 'utf-8');
  const schema = JSON.parse(content);

  const vatResource = schema['x-vat-resource'] || {};

  return {
    uri,
    resourceType: vatResource.resourceType || 'vat:schema',
    mimeType: 'application/schema+json',
    content,
    metadata: {
      title: schema.title,
      description: schema.description,
      tags: vatResource.tags || [],
      checksum: vatResource.checksum,
    },
  };
}

// Register schema resource
const schemaResource = loadSchemaResource('schemas/User.schema.json');
resourceRegistry.register(schemaResource);
```

### Usage in Type Registration

```typescript
// Register type that references schema resource
typeRegistry.register({
  id: 'data/user',
  name: 'User',
  description: 'User record for authentication',
  format: {
    mimeType: 'application/json',
    schemaUri: 'schemas/User.schema.json',  // Link to schema resource
  },
  tags: ['user', 'auth'],
});

// Access schema checksum from resource
const schemaResource = resourceRegistry.get('schemas/User.schema.json');
const checksum = schemaResource.metadata.checksum;  // "sha256-47DEQp..."
```

---

## Package Structure

```
@vibe-agent-toolkit/
├── agent-schema/
│   ├── src/
│   │   ├── types/
│   │   │   ├── DataType.ts
│   │   │   ├── AgentConfig.ts
│   │   │   ├── SchemaReference.ts
│   │   │   └── index.ts
│   │   ├── registries/
│   │   │   ├── TypeRegistry.ts
│   │   │   ├── AgentRegistry.ts
│   │   │   └── index.ts
│   │   └── index.ts
│   └── package.json
│
├── agent-utils/                           # NEW - Build & runtime utilities
│   ├── src/
│   │   ├── schema/
│   │   │   ├── checksum.ts               # RFC 8785 + SHA-256 checksumming
│   │   │   ├── generate.ts               # Zod → JSON Schema generation
│   │   │   ├── manifest.ts               # Schema manifest generation
│   │   │   └── index.ts
│   │   ├── agent/
│   │   │   ├── validate.ts               # AgentConfig validation
│   │   │   ├── metadata.ts               # agent.yaml generation
│   │   │   ├── bundle.ts                 # Bundle agent for npm
│   │   │   └── index.ts
│   │   ├── validation/
│   │   │   ├── type-validator.ts         # DataType validation
│   │   │   ├── compatibility.ts          # Type/agent compatibility checks
│   │   │   └── index.ts
│   │   └── index.ts
│   └── package.json
│
├── agent-runtime/
│   ├── src/
│   │   ├── types/
│   │   │   ├── AgentData.ts
│   │   │   ├── AgentContext.ts
│   │   │   ├── QualityMetadata.ts
│   │   │   ├── ProvenanceMetadata.ts
│   │   │   └── index.ts
│   │   ├── interfaces/
│   │   │   ├── IAgent.ts
│   │   │   └── index.ts
│   │   └── index.ts
│   └── package.json
│
├── orchestration/
│   ├── src/
│   │   ├── interfaces/
│   │   │   ├── AgentOrchestrator.ts
│   │   │   └── FormatConverter.ts
│   │   ├── orchestrators/
│   │   │   ├── FilesystemOrchestrator.ts
│   │   │   ├── InMemoryOrchestrator.ts
│   │   │   └── index.ts
│   │   ├── converters/
│   │   │   ├── BuiltInFormatConverter.ts
│   │   │   └── index.ts
│   │   └── index.ts
│   └── package.json
│
└── vat-development-agents/               # Dogfoods agent-utils
    ├── src/
    │   ├── schemas/
    │   │   ├── agent-schema.ts           # Zod schemas
    │   │   └── generated/                # Generated JSON Schemas (committed)
    │   │       ├── AgentSchema.schema.json
    │   │       └── .schema-manifest.json
    │   ├── agents/
    │   │   └── example-agent/
    │   │       ├── agent.ts
    │   │       └── agent.yaml            # Generated from config
    │   └── index.ts
    ├── scripts/
    │   └── generate-schemas.ts           # Uses agent-utils
    └── package.json
```

### Dependency Graph

```
agent-schema          # Types & registries (lightweight)
    ↑
    |
agent-utils          # Generation & validation utilities
    ↑
    |
    ├─→ cli                        # CLI commands
    └─→ vat-development-agents     # Example agents (dogfooding)
```

---

## Key Design Decisions

### 1. **Type vs Slot Separation**
- **Types** are reusable data contracts (shared)
- **Slots** are agent-specific I/O bindings (local)
- Enables type reuse across agents

### 2. **Postel's Law (Robustness Principle)**
- Inputs: Accept multiple types (liberal)
- Outputs: Produce single type (conservative)
- Orchestrator handles format conversion

### 3. **Metadata on Instances, Not Types**
- Quality metadata attached by meta-agents at runtime
- Provenance metadata added by orchestrator automatically
- Types define contracts, instances carry runtime data

### 4. **Orchestration Agnostic**
- Agent code doesn't know about filesystem, HTTP, etc.
- Same agent runs on any orchestrator
- Orchestrator adapts to transport mechanism

### 5. **Meta-Agents as First-Class**
- Tagged with 'meta-agent' in registry
- Produce quality/validation metadata
- Enable quality gates in pipelines

### 6. **Two Core Registries**
- **TypeRegistry**: Data type definitions
- **AgentRegistry**: Agent configurations
- Everything else is derived or runtime

### 7. **No Fine-Grained Versioning (Initially)**
- DataType and SchemaReference have no version field
- Add versioning later when strategy is determined (semver, content hashing, etc.)
- Agent version kept for provenance tracking only
- Avoids premature complexity without clear versioning strategy

### 8. **Zod-First Schema Development**
- Agent authors write Zod schemas for type safety
- JSON Schema generated at build time (committed to git)
- Runtime validation uses JSON Schema (language-agnostic)
- Single source of truth: Zod schema in TypeScript

### 9. **Schemas as Resources**
- Schemas are first-class resources, not inline objects
- Use existing resource infrastructure (checksums, link validation)
- DataType references schemas via schemaUri (never inline)
- Schemas registered in resource registry like markdown docs

### 10. **VAT Resource Type Taxonomy**
- Hierarchical semantic types (vat:schema, vat:prompt/system, etc.)
- Separate from physical MIME type (semantic vs format)
- Default MIME types per resource type
- Extensible namespace for new resource types

### 11. **RFC 8785 + SRI for Checksums**
- RFC 8785 (JCS) for deterministic JSON serialization
- Subresource Integrity (SRI) format: `sha256-base64hash`
- SHA-256 algorithm (same as Docker, Git)
- Base64 encoding (W3C standard, 33% shorter than hex)
- Can upgrade to sha384/sha512 later if needed

### 12. **x-vat-resource Namespace**
- Single namespace for VAT extensions in JSON Schema
- Contains: resourceType, tags, checksum
- Prevents field collision with other extensions
- JSON Schema best practice

### 13. **Strongly-Typed Zod Metadata**
- VATResourceMeta interface for type safety
- withVATResource() helper for adding metadata
- Metadata becomes 'x-vat-resource' in JSON Schema
- Checksum computed at generation time, not in Zod

### 14. **Generated Files Committed to Git**
- JSON Schema files committed alongside Zod schemas
- Deterministic generation (only changes when content changes)
- Schema manifest tracks checksums
- CI validates schemas stay in sync with source

### 15. **Single agent-utils Package**
- No runtime vs build-time split (YAGNI)
- All agent development utilities in one package
- Developer tooling, not production constraints
- Simplifies architecture, reduces cognitive overhead

---

## Future Enhancements

The following features are designed but not yet implemented. They represent future capabilities that may be added based on requirements.

### Quality Metadata

Quality metadata enables tracking confidence, completeness, and validation results as data flows through agent pipelines.

```typescript
/**
 * Quality metadata - attached to data instances
 * Produced by quality evaluation meta-agents
 */
export interface QualityMetadata {
  /** Confidence score (0.0-1.0) */
  confidence?: number;

  /** Completeness score (0.0-1.0) */
  completeness?: number;

  /** Validation results */
  validation?: {
    /** Schema validation passed */
    schemaValid: boolean;

    /** Business rule validation passed */
    rulesValid: boolean;

    /** Validation issues */
    issues?: ValidationIssue[];
  };

  /** Quality assessment from judge agent */
  assessment?: {
    /** Overall quality score */
    score: number;

    /** Judge agent ID */
    judgeId: string;

    /** Judge agent version */
    judgeVersion: string;

    /** Timestamp */
    timestamp: string;

    /** Detailed quality dimensions */
    dimensions?: Record<string, number>;

    /** Human-readable feedback */
    feedback?: string;
  };

  /** Custom quality dimensions */
  dimensions?: Record<string, number>;

  /** Statistics about the data */
  statistics?: Record<string, number | string>;
}

export interface ValidationIssue {
  severity: 'error' | 'warning' | 'info';
  path: string;  // JSON path to problematic field
  message: string;
  code?: string;
}
```

**Use cases:**
- Tracking OCR confidence scores through document processing pipelines
- Aggregating quality across multi-agent workflows
- Implementing quality gates (reject outputs below threshold)
- Debugging pipeline degradation

**Implementation considerations:**
- Who computes quality? (agent, meta-agent, orchestrator)
- When is it computed? (before write, after write, on read)
- How to aggregate quality across pipelines?
- Quality propagation algebra for confidence multiplication

---

## Usage Example (Complete Workflow)

### Step 1: Define Zod Schema with VAT Metadata

```typescript
// src/schemas/word-count.ts
import { z } from 'zod';
import { withVATResource } from '@vibe-agent-toolkit/agent-utils';

export const WordCountSchema = withVATResource(
  z.object({
    totalWords: z.number().int().min(0).describe('Total word count'),
    uniqueWords: z.number().int().min(0).describe('Unique word count'),
    frequencies: z.record(z.number()).describe('Word frequency map'),
  }),
  {
    resourceType: 'vat:schema',
    tags: ['statistics', 'text-processing'],
  }
);

export type WordCount = z.infer<typeof WordCountSchema>;
```

### Step 2: Generate JSON Schema (Build Time)

```typescript
// scripts/generate-schemas.ts
import { generateSchemaWithChecksum } from '@vibe-agent-toolkit/agent-utils';
import { WordCountSchema } from '../src/schemas/word-count.ts';

await generateSchemaWithChecksum(
  WordCountSchema,
  './src/schemas/generated/WordCount.schema.json',
  { name: 'WordCount' }
);
```

**Generated** `src/schemas/generated/WordCount.schema.json`:
```json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "totalWords": { "type": "integer", "minimum": 0, "description": "Total word count" },
    "uniqueWords": { "type": "integer", "minimum": 0, "description": "Unique word count" },
    "frequencies": { "type": "object", "additionalProperties": { "type": "number" }, "description": "Word frequency map" }
  },
  "required": ["totalWords", "uniqueWords", "frequencies"],
  "x-vat-resource": {
    "resourceType": "vat:schema",
    "tags": ["statistics", "text-processing"],
    "checksum": "sha256-b7e9c3f1a8d2K8p3L4m5N6o7P8q9R0s1T2u3V4w5X6y="
  }
}
```

### Step 3: Register Schema as Resource

```typescript
// src/resources.ts
import { loadSchemaResource } from '@vibe-agent-toolkit/agent-utils';
import { resourceRegistry } from '@vibe-agent-toolkit/resources';

// Load and register schema resource
const wordCountSchema = loadSchemaResource('schemas/generated/WordCount.schema.json');
resourceRegistry.register(wordCountSchema);
```

### Step 4: Register Types

```typescript
// src/types.ts
import { typeRegistry } from '@vibe-agent-toolkit/agent-schema';

// Unstructured type (no schema)
typeRegistry.register({
  id: 'document/text',
  name: 'Text Document',
  description: 'Plain text document',
  format: { mimeType: 'text/plain' },
  tags: ['document', 'unstructured'],
});

// Structured type (references schema resource)
typeRegistry.register({
  id: 'data/word-count',
  name: 'Word Count Data',
  description: 'Word frequency statistics',
  format: {
    mimeType: 'application/json',
    schemaUri: 'schemas/generated/WordCount.schema.json',  // Link to resource
  },
  tags: ['structured', 'statistics'],
});
```

### Step 5: Define Agent Config

```typescript
// src/agents/word-counter/config.ts
import { AgentConfig } from '@vibe-agent-toolkit/agent-schema';

export const wordCounterConfig: AgentConfig = {
  id: 'word-counter',
  name: 'Word Counter',
  description: 'Counts word frequencies in text documents',
  version: '1.0.0',
  inputs: [
    {
      name: 'document',
      types: ['document/text', 'document/markdown'],
      description: 'Input document to analyze',
      required: true
    }
  ],
  outputs: [
    {
      name: 'wordCount',
      type: 'data/word-count',
      description: 'Word frequency statistics',
      required: true
    }
  ],
  tags: ['text-processing', 'analysis']
};
```

### Step 6: Implement Agent

```typescript
// src/agents/word-counter/agent.ts
import { IAgent, AgentContext } from '@vibe-agent-toolkit/agent-runtime';
import { WordCountSchema, type WordCount } from '../../schemas/word-count.ts';

export class WordCounterAgent implements IAgent {
  async execute(context: AgentContext): Promise<void> {
    // Read input (orchestrator handles type conversion)
    const document = await context.readInput('document');
    const text = document.content.toString();

    // Process
    const words = text.split(/\s+/).filter(w => w.length > 0);
    const frequencies: Record<string, number> = {};
    for (const word of words) {
      frequencies[word] = (frequencies[word] || 0) + 1;
    }

    const result: WordCount = {
      totalWords: words.length,
      uniqueWords: Object.keys(frequencies).length,
      frequencies,
    };

    // Validate with Zod before writing
    const validated = WordCountSchema.parse(result);

    // Write output
    await context.writeOutput('wordCount', validated, {
      quality: {
        confidence: 1.0,
        completeness: 1.0,
      }
    });
  }
}
```

### Step 7: Register and Execute

```typescript
// src/main.ts
import { agentRegistry } from '@vibe-agent-toolkit/agent-schema';
import { FilesystemOrchestrator } from '@vibe-agent-toolkit/orchestration';
import { wordCounterConfig } from './agents/word-counter/config.ts';
import { WordCounterAgent } from './agents/word-counter/agent.ts';

// Register agent
agentRegistry.register(wordCounterConfig);

// Execute with orchestrator
const orchestrator = new FilesystemOrchestrator('./workspace');
orchestrator.registerAgent(wordCounterConfig, new WordCounterAgent());
await orchestrator.executeAgent('word-counter');

// Access output with checksum
const output = await orchestrator.getOutput('word-counter', 'wordCount');
console.log('Checksum:', output.type.id);  // Includes schema checksum
console.log('Word count:', output.content);
```

### Workflow Summary

1. **Development**: Write Zod schemas with VAT metadata (`withVATResource()`)
2. **Build**: Generate JSON Schemas with x-vat-resource and checksums (SRI format)
3. **Register Resources**: Load schemas as resources in resource registry
4. **Register Types**: Register types that reference schema URIs
5. **Define Agents**: Agent configs reference type IDs
6. **Implement**: Agents use Zod for validation, write to context
7. **Execute**: Orchestrator handles I/O, type conversion, metadata

**Key Points**:
- Schemas are resources (not inline objects)
- x-vat-resource contains resourceType, tags, checksum
- Checksums use SRI format: `sha256-base64hash`
- Types reference schemas via schemaUri
- Link validation works automatically (schemas are resources)


Enhancement: Type System and Agent Schema Design #18

Description

VAT Type System and Agent Schema Design

Overview

Type System

Data Type Definition

Zod Schema

VAT Resource Types

Registries

Schema URI Resolution

Example Type Definitions

Agent Schema

Core Definitions

Zod Schemas

Example Agent Definitions

Runtime Components

Agent Data (Runtime Instance)

Quality Metadata

Provenance Metadata

Agent Context (Execution API)

Orchestration Concepts

Orchestrator Interface

Filesystem Orchestrator Example

In-Memory Orchestrator Example

Schema Generation and Checksumming

Zod-First Development Workflow

Step 1: Define Zod Schema with VAT Metadata

Step 2: Generate JSON Schema with Checksum

RFC 8785 + Subresource Integrity (SRI)

Canonicalization (RFC 8785)

Checksum Format (SRI)

Schema Manifest

Schema Resource Registration

Usage in Type Registration

Package Structure

Dependency Graph

Key Design Decisions

1. Type vs Slot Separation

2. Postel's Law (Robustness Principle)

3. Metadata on Instances, Not Types

4. Orchestration Agnostic

5. Meta-Agents as First-Class

6. Two Core Registries

7. No Fine-Grained Versioning (Initially)

8. Zod-First Schema Development

9. Schemas as Resources

10. VAT Resource Type Taxonomy

11. RFC 8785 + SRI for Checksums

12. x-vat-resource Namespace

13. Strongly-Typed Zod Metadata

14. Generated Files Committed to Git

15. Single agent-utils Package

Future Enhancements

Quality Metadata

Usage Example (Complete Workflow)

Step 1: Define Zod Schema with VAT Metadata

Step 2: Generate JSON Schema (Build Time)

Step 3: Register Schema as Resource

Step 4: Register Types

Step 5: Define Agent Config

Step 6: Implement Agent

Step 7: Register and Execute

Workflow Summary

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions