fix(13) + docs(14): Responses API blocklist workaround + section 14 intro + scan artifacts#16
Merged
Conversation
…es API service bug 13-guardrails was returning HTTP 500 InternalServerError on every Responses API call to the contoso-bank-agent. Root cause isolated empirically: when ANY customBlocklists entry is attached to the RAI policy (Prompt-side, Completion-side, or both), the Responses API runtime returns 500 on happy-path content while still correctly returning 400 content_filter on blocked content. The exact same policy works fine through Chat Completions. This is the service-side analogue of the Java SDK array-shape issue at Azure/azure-sdk-for-java#49196 - the service returns custom_blocklists as a JSON array in content_filter_results and the Responses runtime appears to have the same array-vs-object mismatch in its response assembly. Fix: customBlocklists is now an empty list in 13-01's RAI policy body. The bank-demo-blocklist resource is still created (visible in portal, two-line re-attachment once Microsoft fixes the service). Standard content filters + Prompt Shields (Jailbreak / Indirect Attack / Protected Material) still work via the Responses API path used by every other agent notebook. Documentation updates: - 13-00-guardrails.md: Known limitation note up top + updated architecture diagram + revised portal-fallback steps - 13-01: detailed comment on the RAI policy cell explaining the bug, the empirical evidence, the workaround, and how to re-enable - 13-03: header note that PII / blocklist scenarios will not block until service bug is fixed (clean banking + prompt injection still demo cleanly)
Add an "Important context" paragraph to the KNOWN ISSUE comment in 13-01 and the Known limitation callout in 13-00 to make explicit that the blocklist mechanism has been verified previously in this notebook series: the cached outputs of 13-03-demo-guardrails.ipynb were captured when customBlocklists was attached and show all 5 PII inputs + all 5 codename/competitor prompts blocking correctly through the Responses API. This matters because the service bug isn't a "the blocklist never worked here" situation - it is a regression in the Responses API runtime's response assembly. When Microsoft fixes the service and the blocklist is re-attached, no further demo verification is needed; the cached results are direct evidence the policy + blocklist combination behaves correctly.
…am artifacts
Section 13 cached outputs refreshed:
- 13-02-create-bank-agent.ipynb: ran against the customBlocklists-empty
policy, agent now at v2, smoke test returns a normal answer
- 13-03-demo-guardrails.ipynb: re-run produces the new expected behaviour
(clean + prompt-injection still pass; PII + blocklist categories
visibly do not block, consistent with the doc updates in the previous
commits)
Section 14 (red teaming):
- NEW 14-00-red-teaming.md section overview: introduces the AI Red
Teaming Agent (PyRIT), the region constraint, the two notebooks, the
callback/APIM architecture, and links to the official Microsoft docs
+ PyRIT GitHub. Brings section 14 in line with every other section's
NN-00-* intro page.
- Committed PyRIT scan output artifacts (redteam_basic_output/ and
redteam_advanced_output/{strategies,multilang,custom}/) so readers
can see what the scans produce without running them. ~110KB total,
verified to contain no tenant identifiers.
- Added custom_attack_prompts.json as the source-of-truth seed file for
the custom-objectives scan.
Cleanup:
- Renamed PyRIT scan_name arguments from legacy Lab16-* to descriptive
redteam-* in both notebooks (and in cached outputs / committed scan
JSON). The Lab16 prefix escaped the 0.8.0 "Lab N" cleanup pass.
- Scrubbed absolute local paths (/home/jp/...) from cached outputs and
Python stack traces (6 occurrences in 14-01, 80 in 14-02) to
<repo-root> and <uv-python> placeholders, per CONTRIBUTING.md
notebook-output hygiene policy.
Dependency:
- Re-added the [redteam] extra to azure-ai-evaluation in pyproject.toml
to pull PyRIT in for section 14.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bundles three concerns into one release. All work uses the established
agent_reference/ Responses API pattern; no API regressions.1. Guardrails: Responses API + customBlocklists service bug workaround
13-guardrails/13-02-create-bank-agent.ipynband13-03-demo-guardrails.ipynbreturnedInternalServerError: 500on every Responses API call. Empirically isolated: when ANYcustomBlocklistsentry is attached to the RAI policy (Prompt-side, Completion-side, or both), the Responses API runtime returns 500 on happy-path content while still correctly returning 400content_filteron blocked content. Same policy works through Chat Completions. This is the service-side analogue of the Java SDK array-shape issue #49196.Fix:
customBlocklistsis now an empty list in 13-01's RAI policy body. Thebank-demo-blocklistresource is still created (visible in portal, two-line re-attachment once Microsoft fixes the service). Standard filters + Prompt Shields still work via the Responses API.Detailed comment on the RAI policy cell in 13-01 captures the bug, the empirical evidence (including that the blocklist itself has been verified end-to-end via the cached 13-03 outputs from a previous run when it was attached), and the two-line re-enable path.
2. Section 14: new intro page + scan artifacts
14-red-teaming/14-00-red-teaming.md- matches theNN-00-*intro page pattern every other section has. Covers PyRIT, the region constraint, the two notebooks, the callback/APIM architecture, and links to Microsoft docs + PyRIT GitHub.redteam_basic_output/andredteam_advanced_output/{strategies,multilang,custom}/(~110KB total) as demo artifacts so readers can see what the scans produce without running them. No tenant identifiers in any of the JSON.custom_attack_prompts.jsonas the source-of-truth seed file for the custom-objectives scan.3. Cleanup
Lab16-*scan names →redteam-*: PyRITscan_namearguments in 14-01 / 14-02 escaped the 0.8.0 "Lab N" cleanup pass. Renamed in source cells, cached outputs, and the committed scan JSON./home/jp/...removed from 14-01 / 14-02 cached outputs and tracebacks, replaced with<repo-root>and<uv-python>per CONTRIBUTING.md notebook-output hygiene policy.[redteam]extra toazure-ai-evaluationin pyproject (pulls in PyRIT for section 14).Patch release 0.8.9.
Test plan
13-02smoke test returns a normal answer through the Responses API13-03clean banking + prompt-injection categories pass; PII + blocklist categories show as not-blocked, matching the updated category table14-01and14-02run end-to-end (with[redteam]extra installed)14-00-red-teaming.mdrenders correctly, all internal links resolve/home/jpandLab16returns no results