[FEATURE]: Optional Promptfoo Policy Plugin Integration

## Summary

Integrate Promptfoo's policy plugin as an **optional** red-teaming source in the Smith test generation pipeline. The plugin generates disallow and malicious test cases scoped to specific user profiles and policy constraints. This is complementary to Smith's core test generation pipeline.

## Motivation

Traditional red-teaming tools focus on attacking at the guardrail/prompt level — testing whether the LLM can be jailbroken or tricked into harmful outputs. However, they do not consider system variables, or structured policy rules. Promptfoo's policy plugin fills this gap by generating adversarial test cases that target policy boundaries based on user contexts.

However, Smith's generation is more automatic — it decomposes each user context by mutating system variables without manual configuration. Promptfoo, on the other hand, requires dedicated design of specific user contexts, careful crafting of `testGenerationInstructions`, and a separate configuration file. This makes it more effort-intensive and requires additional user knowledge. We will include several examples of these configurations to lower the barrier, but Promptfoo remains optional for teams willing to invest the extra setup.

Smith's core pipeline and Promptfoo's policy plugin both generate adversarial test cases against policy rules, but from different angles. Smith decomposes guidance into structured conditions and generates cases systematically, while Promptfoo generates attack prompts based on specific user context and policy text. Having both may provide broader coverage.

This integration is **optional**. The core Smith pipeline works without Promptfoo. Teams that have Promptfoo installed (`npm install -g promptfoo`) can enable it for additional disallow/malicious test coverage. 

## Design Considerations

### Configuration File

Promptfoo requires a dedicated configuration file (`promptfooconfig.yaml`) per target agent. This file must be manually authored and defines:

- **`purpose`**: Agent description
- **`vars`**: Top-level system variables
- **`contexts`**: User profiles to generate attacks for (id, purpose, vars)
- **`provider`**: LLM for generation (e.g., `ollama:chat:qwen3.5:latest`)
- **`plugins.policy`**: The policy text to attack against
- **`testGenerationInstructions`**: Constraints on generation quality
- **`strategies`**: Attack strategies

### Variable Format Translation

Promptfoo only supports **string variables** in its output. Smith's policy engine requires typed variables (lists, booleans, integers). Smith handles this translation in `convert_test_case.py` using `system_vars.json` as the type reference:

- Promptfoo outputs: `roles: "employee"`, `approval: "false"`, `queries_this_session: "5"`
- Smith converts to: `roles: ["employee"]`, `approval: false`, `queries_this_session: 5`

Conversion rules:
- String → list (if reference is a list)
- String → int/float (if reference is numeric)
- String `"true"`/`"false"` → boolean (applied after type conversion)

### Output

All generated cases are labeled `promptfoo_malicious` and placed in the `promptfoo_malicious/` test directory.

## Current State

- [x] `promptfooconfig.yaml` templates created for HR Agent, Call-for-Papers, Car Price examples
- [x] `attack_promptfoo.py` invokes Promptfoo and extracts cases with system variables
- [x] `convert_test_case.py` handles type conversion from Promptfoo's string-only format
- [x] Handled `OLLAMA_BASE_URL` configuration (Promptfoo uses native ollama API at `/api/chat`, not the OpenAI-compatible `/v1` endpoint)

## Open Work

- [ ] Make Promptfoo an optional setting in `.env` — currently runs by default during test generation
- [ ] Translate Promptfoo cases to CMF format
- [ ] Cross validate Promptfoo cases with the policy (observed 2–36% mislabel rate depending on policy complexity)
- [ ] Integrate Promptfoo cases into **policy patching**
- [ ] Document `promptfooconfig.yaml` authoring (context design, provider setup, variable mapping)
- [ ] Automatically generate `promptfooconfig.yaml` from `guidance.txt` and `system_vars.json` to reduce manual effort

## Known Limitations

- **Mislabeled cases**: The plugin may generate prompts for actions that are actually legal for the given context. Error rates vary (2–36% observed). Roles with broad permissions (e.g., "analyst" with full access) are most affected.
- **String-only variables**: Promptfoo does not support typed variables natively, requiring Smith to perform post-processing conversion.
- **Manual configuration**: `promptfooconfig.yaml` must currently be authored by hand. Auto-generation from existing Smith inputs is planned as a future feature.
- **Provider dependency**: Requires a running LLM (local ollama or Promptfoo cloud account) for generation.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE]: Optional Promptfoo Policy Plugin Integration #2

Summary

Motivation

Design Considerations

Configuration File

Variable Format Translation

Output

Current State

Open Work

Known Limitations

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[FEATURE]: Optional Promptfoo Policy Plugin Integration #2

Description

Summary

Motivation

Design Considerations

Configuration File

Variable Format Translation

Output

Current State

Open Work

Known Limitations

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions