Skip to content

fix(13) + docs(14): Responses API blocklist workaround + section 14 intro + scan artifacts#16

Merged
corticalstack merged 5 commits into
mainfrom
fix/responses-api-blocklist-service-bug
May 27, 2026
Merged

fix(13) + docs(14): Responses API blocklist workaround + section 14 intro + scan artifacts#16
corticalstack merged 5 commits into
mainfrom
fix/responses-api-blocklist-service-bug

Conversation

@corticalstack
Copy link
Copy Markdown
Owner

@corticalstack corticalstack commented May 27, 2026

Summary

Bundles three concerns into one release. All work uses the established agent_reference / Responses API pattern; no API regressions.

1. Guardrails: Responses API + customBlocklists service bug workaround

13-guardrails/13-02-create-bank-agent.ipynb and 13-03-demo-guardrails.ipynb returned InternalServerError: 500 on every Responses API call. Empirically isolated: when ANY customBlocklists entry is attached to the RAI policy (Prompt-side, Completion-side, or both), the Responses API runtime returns 500 on happy-path content while still correctly returning 400 content_filter on blocked content. Same policy works through Chat Completions. This is the service-side analogue of the Java SDK array-shape issue #49196.

Fix: customBlocklists is now an empty list in 13-01's RAI policy body. The bank-demo-blocklist resource is still created (visible in portal, two-line re-attachment once Microsoft fixes the service). Standard filters + Prompt Shields still work via the Responses API.

Policy config Responses API
12 filters, 0 blocklists ✅ 5/5
12 filters, Prompt-side blocklist ❌ 5/5 fail
12 filters, Completion-side blocklist ❌ 5/5 fail
12 filters, Prompt + Completion (original) ❌ consistent 500

Detailed comment on the RAI policy cell in 13-01 captures the bug, the empirical evidence (including that the blocklist itself has been verified end-to-end via the cached 13-03 outputs from a previous run when it was attached), and the two-line re-enable path.

2. Section 14: new intro page + scan artifacts

  • NEW 14-red-teaming/14-00-red-teaming.md - matches the NN-00-* intro page pattern every other section has. Covers PyRIT, the region constraint, the two notebooks, the callback/APIM architecture, and links to Microsoft docs + PyRIT GitHub.
  • Committed redteam_basic_output/ and redteam_advanced_output/{strategies,multilang,custom}/ (~110KB total) as demo artifacts so readers can see what the scans produce without running them. No tenant identifiers in any of the JSON.
  • Committed custom_attack_prompts.json as the source-of-truth seed file for the custom-objectives scan.

3. Cleanup

  • Legacy Lab16-* scan names → redteam-*: PyRIT scan_name arguments in 14-01 / 14-02 escaped the 0.8.0 "Lab N" cleanup pass. Renamed in source cells, cached outputs, and the committed scan JSON.
  • Path scrubbing: 86 occurrences of /home/jp/... removed from 14-01 / 14-02 cached outputs and tracebacks, replaced with <repo-root> and <uv-python> per CONTRIBUTING.md notebook-output hygiene policy.
  • Refreshed 13-02 / 13-03 outputs to reflect post-fix behaviour (agent v2; PII / blocklist categories visibly do not block; clean + prompt-injection still pass).
  • Re-added [redteam] extra to azure-ai-evaluation in pyproject (pulls in PyRIT for section 14).

Patch release 0.8.9.

Test plan

  • 13-02 smoke test returns a normal answer through the Responses API
  • 13-03 clean banking + prompt-injection categories pass; PII + blocklist categories show as not-blocked, matching the updated category table
  • 14-01 and 14-02 run end-to-end (with [redteam] extra installed)
  • 14-00-red-teaming.md renders correctly, all internal links resolve
  • Repo-wide grep for /home/jp and Lab16 returns no results

…es API service bug

13-guardrails was returning HTTP 500 InternalServerError on every Responses
API call to the contoso-bank-agent. Root cause isolated empirically: when
ANY customBlocklists entry is attached to the RAI policy (Prompt-side,
Completion-side, or both), the Responses API runtime returns 500 on
happy-path content while still correctly returning 400 content_filter on
blocked content. The exact same policy works fine through Chat Completions.

This is the service-side analogue of the Java SDK array-shape issue at
Azure/azure-sdk-for-java#49196 - the service
returns custom_blocklists as a JSON array in content_filter_results and
the Responses runtime appears to have the same array-vs-object mismatch
in its response assembly.

Fix: customBlocklists is now an empty list in 13-01's RAI policy body. The
bank-demo-blocklist resource is still created (visible in portal, two-line
re-attachment once Microsoft fixes the service). Standard content filters
+ Prompt Shields (Jailbreak / Indirect Attack / Protected Material) still
work via the Responses API path used by every other agent notebook.

Documentation updates:
- 13-00-guardrails.md: Known limitation note up top + updated architecture
  diagram + revised portal-fallback steps
- 13-01: detailed comment on the RAI policy cell explaining the bug, the
  empirical evidence, the workaround, and how to re-enable
- 13-03: header note that PII / blocklist scenarios will not block until
  service bug is fixed (clean banking + prompt injection still demo cleanly)
Add an "Important context" paragraph to the KNOWN ISSUE comment in 13-01
and the Known limitation callout in 13-00 to make explicit that the
blocklist mechanism has been verified previously in this notebook series:
the cached outputs of 13-03-demo-guardrails.ipynb were captured when
customBlocklists was attached and show all 5 PII inputs + all 5
codename/competitor prompts blocking correctly through the Responses API.

This matters because the service bug isn't a "the blocklist never worked
here" situation - it is a regression in the Responses API runtime's
response assembly. When Microsoft fixes the service and the blocklist is
re-attached, no further demo verification is needed; the cached results
are direct evidence the policy + blocklist combination behaves correctly.
…am artifacts

Section 13 cached outputs refreshed:
- 13-02-create-bank-agent.ipynb: ran against the customBlocklists-empty
  policy, agent now at v2, smoke test returns a normal answer
- 13-03-demo-guardrails.ipynb: re-run produces the new expected behaviour
  (clean + prompt-injection still pass; PII + blocklist categories
  visibly do not block, consistent with the doc updates in the previous
  commits)

Section 14 (red teaming):
- NEW 14-00-red-teaming.md section overview: introduces the AI Red
  Teaming Agent (PyRIT), the region constraint, the two notebooks, the
  callback/APIM architecture, and links to the official Microsoft docs
  + PyRIT GitHub. Brings section 14 in line with every other section's
  NN-00-* intro page.
- Committed PyRIT scan output artifacts (redteam_basic_output/ and
  redteam_advanced_output/{strategies,multilang,custom}/) so readers
  can see what the scans produce without running them. ~110KB total,
  verified to contain no tenant identifiers.
- Added custom_attack_prompts.json as the source-of-truth seed file for
  the custom-objectives scan.

Cleanup:
- Renamed PyRIT scan_name arguments from legacy Lab16-* to descriptive
  redteam-* in both notebooks (and in cached outputs / committed scan
  JSON). The Lab16 prefix escaped the 0.8.0 "Lab N" cleanup pass.
- Scrubbed absolute local paths (/home/jp/...) from cached outputs and
  Python stack traces (6 occurrences in 14-01, 80 in 14-02) to
  <repo-root> and <uv-python> placeholders, per CONTRIBUTING.md
  notebook-output hygiene policy.

Dependency:
- Re-added the [redteam] extra to azure-ai-evaluation in pyproject.toml
  to pull PyRIT in for section 14.
@corticalstack corticalstack changed the title fix(13): drop customBlocklists from RAI policy (Responses API service bug workaround) fix(13) + docs(14): Responses API blocklist workaround + section 14 intro + scan artifacts May 27, 2026
@corticalstack corticalstack merged commit e6c3598 into main May 27, 2026
1 of 2 checks passed
@corticalstack corticalstack deleted the fix/responses-api-blocklist-service-bug branch May 27, 2026 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant