Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
356 changes: 356 additions & 0 deletions notebooks/azure-ai-search-custom-schema-citations.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,356 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "6907b88f",
"metadata": {},
"source": [
"You have an Azure AI Search index whose URL, title, and content fields are **not** named\n",
"`url` / `title` / `content` — they're `blob_url`, `uid`, `snippet`, or whatever your blob or\n",
"SharePoint integrated-vectorization pipeline produced. You wire it into a Foundry Agent with the\n",
"**Azure AI Search tool**, the answers are great, but the `url_citation` annotations come back as\n",
"useless placeholders:\n",
"\n",
"```text\n",
"title='doc_0' url='https://<service>.search.windows.net/'\n",
"```\n",
"\n",
"**The pattern this recipe teaches:** register the index as a *project asset* with a `FieldMapping`,\n",
"then point the `AzureAISearchTool` at it via `index_asset_id`. The agent's citations then resolve to\n",
"your real fields. No re-indexing, no schema change, no touching the index.\n",
"\n",
"### What you'll do\n",
"\n",
"1. Register your existing index as a project asset with a `FieldMapping`\n",
"2. Create an agent that references the asset by `index_asset_id`\n",
"3. Ask a question and read citations that resolve to your real `url` / `title` fields\n",
"4. Learn the failure modes that produce `doc_0` placeholders and how to avoid each one\n",
"\n",
"By the end you have a copyable two-step (`create_or_update` + `index_asset_id`) you can drop into any\n",
"agent that grounds on a custom-schema Azure AI Search index."
]
},
{
"cell_type": "markdown",
"id": "a01e6271",
"metadata": {},
"source": [
"## 1 · Prerequisites\n",
"\n",
"| | |\n",
"|---|---|\n",
"| Microsoft Foundry project | A project endpoint and one chat deployment (e.g. `gpt-4.1`) |\n",
"| Azure AI Search | An existing index, connected to the project as a `CognitiveSearch` connection |\n",
"| Identity | `az login` — the notebook uses `DefaultAzureCredential` |\n",
"\n",
"You do **not** need to re-index or rename any fields. This recipe works against the schema you\n",
"already have.\n",
"\n",
"### Install dependencies"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4efa992a",
"metadata": {},
"outputs": [],
"source": [
"%pip install --quiet \"azure-ai-projects>=2.0.0\" \"azure-identity>=1.19.0\""
]
},
{
"cell_type": "markdown",
"id": "6a1981e5",
"metadata": {},
"source": [
"## 2 · Configure endpoints and your index's real field names\n",
"\n",
"Set these in your shell (or a local `.env`) and the cell below reads them. The three field names at\n",
"the bottom are the whole point — they are the part of your index that differs from the docs.\n",
"\n",
"```bash\n",
"PROJECT_ENDPOINT=https://<resource>.services.ai.azure.com/api/projects/<project>\n",
"SEARCH_CONNECTION_NAME=my-search-connection # the CognitiveSearch connection NAME, not its id\n",
"INDEX_NAME=my-custom-index # your existing index\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f94d2fe4",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"PROJECT_ENDPOINT = os.getenv(\"PROJECT_ENDPOINT\", \"https://<resource>.services.ai.azure.com/api/projects/<project>\")\n",
"SEARCH_CONNECTION_NAME = os.getenv(\"SEARCH_CONNECTION_NAME\", \"my-search-connection\") # connection NAME, not id\n",
"INDEX_NAME = os.getenv(\"INDEX_NAME\", \"my-custom-index\")\n",
"MODEL = os.getenv(\"MODEL\", \"gpt-4.1\")\n",
"\n",
"# Your index's real field names -- the part that differs from the docs.\n",
"URL_FIELD = os.getenv(\"URL_FIELD\", \"blob_url\") # your URL field -> annotation.url\n",
"TITLE_FIELD = os.getenv(\"TITLE_FIELD\", \"uid\") # your title field -> annotation.title\n",
"CONTENT_FIELD = os.getenv(\"CONTENT_FIELD\", \"snippet\") # your content field\n",
"\n",
"print(f\"project : {PROJECT_ENDPOINT}\")\n",
"print(f\"connection : {SEARCH_CONNECTION_NAME}\")\n",
"print(f\"index : {INDEX_NAME}\")\n",
"print(f\"fields : url={URL_FIELD!r} title={TITLE_FIELD!r} content={CONTENT_FIELD!r}\")"
]
},
{
"cell_type": "markdown",
"id": "ceec87b3",
"metadata": {},
"source": [
"## 3 · Create the client"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "41e1f5b1",
"metadata": {},
"outputs": [],
"source": [
"from azure.identity import DefaultAzureCredential\n",
"from azure.ai.projects import AIProjectClient\n",
"\n",
"project = AIProjectClient(endpoint=PROJECT_ENDPOINT, credential=DefaultAzureCredential())\n",
"openai = project.get_openai_client()\n",
"print(\"client created\")"
]
},
{
"cell_type": "markdown",
"id": "1e442bfa",
"metadata": {},
"source": [
"## 4 · Register the index as an asset **with a field mapping**\n",
"\n",
"This is the step that makes citations work. `FieldMapping` maps your custom fields onto the citation\n",
"slots the tool understands. The mapping lives on the **registered asset** — not on the tool (see the\n",
"Gotchas table for why that distinction matters)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cbbb4f13",
"metadata": {},
"outputs": [],
"source": [
"from azure.ai.projects.models import AzureAISearchIndex, FieldMapping\n",
"\n",
"ASSET_NAME, ASSET_VERSION = \"my-custom-index-mapped\", \"1\"\n",
"\n",
"asset = project.indexes.create_or_update(\n",
" name=ASSET_NAME, version=ASSET_VERSION,\n",
" index=AzureAISearchIndex(\n",
" name=ASSET_NAME, version=ASSET_VERSION,\n",
" connection_name=SEARCH_CONNECTION_NAME, # connection NAME\n",
" index_name=INDEX_NAME,\n",
" field_mapping=FieldMapping(\n",
" content_fields=[CONTENT_FIELD], # required\n",
" url_field=URL_FIELD, # -> annotation.url\n",
" title_field=TITLE_FIELD, # -> annotation.title\n",
" # filepath_field=\"...\", # optional\n",
" ),\n",
" ),\n",
")\n",
"print(f\"registered asset {ASSET_NAME}/versions/{ASSET_VERSION}\")"
]
},
{
"cell_type": "markdown",
"id": "78fa244a",
"metadata": {},
"source": [
"## 5 · Create the agent, referencing the asset by `index_asset_id`\n",
"\n",
"> ⚠️ `index_asset_id` **must** be `\"<name>/versions/<version>\"`, and it is **mutually exclusive**\n",
"> with `project_connection_id` + `index_name`. Set **only** `index_asset_id`, or the service rejects\n",
"> the request with `Multiple values specified for oneof knowledge_index`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e6d55723",
"metadata": {},
"outputs": [],
"source": [
"from azure.ai.projects.models import (\n",
" AzureAISearchTool, AzureAISearchToolResource, AISearchIndexResource,\n",
" AzureAISearchQueryType, PromptAgentDefinition,\n",
")\n",
"\n",
"agent = project.agents.create_version(\n",
" agent_name=\"search-custom-schema\",\n",
" definition=PromptAgentDefinition(\n",
" model=MODEL,\n",
" instructions=(\n",
" \"Answer only from the Azure AI Search tool. Always cite sources, \"\n",
" \"rendered as [message_idx:search_idx†source].\"\n",
" ),\n",
" tools=[AzureAISearchTool(azure_ai_search=AzureAISearchToolResource(indexes=[\n",
" AISearchIndexResource(\n",
" index_asset_id=f\"{ASSET_NAME}/versions/{ASSET_VERSION}\",\n",
" query_type=AzureAISearchQueryType.SEMANTIC, # or VECTOR_SEMANTIC_HYBRID\n",
" top_k=5,\n",
" )\n",
" ]))],\n",
" ),\n",
")\n",
"print(f\"agent {agent.name} v{agent.version}\")"
]
},
{
"cell_type": "markdown",
"id": "e5f68d84",
"metadata": {},
"source": [
"## 6 · Ask a question and read the citations\n",
"\n",
"Stream a response and pull the `url_citation` annotations off the final message. With the mapping in\n",
"place, `title` and `url` now carry your real field values."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d087fd4b",
"metadata": {},
"outputs": [],
"source": [
"stream = openai.responses.create(\n",
" stream=True, tool_choice=\"required\",\n",
" input=\"What does the P4324 do?\",\n",
" extra_body={\"agent_reference\": {\"name\": agent.name, \"type\": \"agent_reference\"}},\n",
")\n",
"\n",
"for event in stream:\n",
" if event.type == \"response.output_text.delta\":\n",
" print(event.delta, end=\"\")\n",
" elif event.type == \"response.output_item.done\":\n",
" item = event.item\n",
" if item.type == \"message\" and item.content:\n",
" last = item.content[-1]\n",
" if getattr(last, \"type\", None) == \"output_text\":\n",
" for a in (last.annotations or []):\n",
" if a.type == \"url_citation\":\n",
" print(f\"\\nCITATION title={a.title!r} url={a.url!r}\")"
]
},
{
"cell_type": "markdown",
"id": "711a6c33",
"metadata": {},
"source": [
"**Expected output** — your real fields now surface instead of `doc_0` placeholders:\n",
"\n",
"```text\n",
"CITATION title='P4324 Programmable Flow Controller — Overview'\n",
" url='https://contoso-docs.example.com/p4324/overview'\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "7ddaee5b",
"metadata": {},
"source": [
"## 7 · Clean up (optional)\n",
"\n",
"Delete the agent version. Keep the asset if you want to reuse the mapping for other agents."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "60909bf8",
"metadata": {},
"outputs": [],
"source": [
"project.agents.delete_version(agent_name=agent.name, agent_version=agent.version)\n",
"# project.indexes.delete(name=ASSET_NAME, version=ASSET_VERSION) # keep it to reuse the mapping\n",
"print(\"cleaned up agent\")"
]
},
{
"cell_type": "markdown",
"id": "874a873c",
"metadata": {},
"source": [
"## Gotchas\n",
"\n",
"Every row here is a failure mode that produces broken or placeholder citations.\n",
"\n",
"| Symptom | Cause | Fix |\n",
"|---|---|---|\n",
"| `title=\"doc_0\"`, `url=https://<svc>.search.windows.net/` | Direct `project_connection_id` + `index_name` path — citations only read literal `url` / `title` fields | Use the `index_asset_id` + `FieldMapping` path above |\n",
"| `Invalid IndexId format` | `index_asset_id` was a bare name or `name:1` | Must be `\"<name>/versions/<version>\"` (e.g. `.../versions/1` or `.../versions/latest`) |\n",
"| `Multiple values specified for oneof knowledge_index` | Set both `index_asset_id` and `project_connection_id` / `index_name` | Set **only** `index_asset_id` |\n",
"| Field mapping ignored | Passed `parameters.field_mapping` as a dict on the tool | That key is silently dropped; the mapping must live on the **registered asset**, not the tool |\n",
"| Answer is right but citations wrong | The tool concatenates content regardless of field names, so answers work even when citations don't | The mapping fixes citations specifically |\n",
"\n",
"**Alternative (no asset registration):** rename or alias your URL and title fields to literally `url`\n",
"and `title` in the index (indexer output field mappings, or write both on push). The direct path then\n",
"works too. Prefer the asset + `FieldMapping` route when you can't touch the index."
]
},
{
"cell_type": "markdown",
"id": "f905039e",
"metadata": {},
"source": [
"## Verified run (real output)\n",
"\n",
"Ran these steps **verbatim** on 2026-06-05 against a real index `azstool-e2e-custom`\n",
"(fields `id`, `uid`, `blob_url`, `snippet`) on a live Foundry project, starting from a fresh asset\n",
"registration. Actual console output:\n",
"\n",
"```text\n",
"[step 1] client created\n",
"[step 2] registered asset cookbook-verify-mapped/versions/1:\n",
" {'type': 'AzureSearch', 'connectionName': 'fsunavala-srch-demos-prod',\n",
" 'indexName': 'azstool-e2e-custom',\n",
" 'fieldMapping': {'contentFields': ['snippet'], 'titleField': 'uid', 'urlField': 'blob_url'},\n",
" 'name': 'cookbook-verify-mapped', 'version': '1'}\n",
"[step 3] agent search-custom-schema v1\n",
"[step 4] streamed answer + citations:\n",
"\n",
"The P4324 is a programmable industrial flow controller designed to regulate the flow rate of\n",
"liquids and gases in process pipelines. It does this by modulating a built-in proportional valve.\n",
"The device takes 4-20mA and Modbus RTU setpoints and can maintain flow to within +/- 0.5 percent\n",
"of the target value【4:0†source】.\n",
"\n",
"CITATION title='P4324 Programmable Flow Controller — Overview' url='https://contoso-docs.example.com/p4324/overview'\n",
"\n",
"[step 5] cleaned up agent + asset\n",
"```\n",
"\n",
"**Confirmed:** `title` resolved from the index's `uid` field and `url` from its `blob_url` field —\n",
"no `doc_0` placeholder, no `https://<svc>.search.windows.net/` fallback. The same index referenced\n",
"*directly* (without the asset + `FieldMapping`) returns `title='doc_0'`,\n",
"`url='https://<svc>.search.windows.net/'` — the broken baseline this recipe fixes."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
15 changes: 15 additions & 0 deletions registry.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -189,3 +189,18 @@
- mcp
- agents
- agent-service

- slug: azure-ai-search-custom-schema-citations
path: notebooks/azure-ai-search-custom-schema-citations.ipynb
title: "Fix Agent Citations for Custom Azure AI Search Schemas"
description: "Get real url/title citations from the Azure AI Search tool when your index fields aren't named url/title/content — register the index as an asset with a FieldMapping and use index_asset_id."
date: "2026-06-05"
authors:
- github: farzad528
tags:
- azure-ai-search
- agents
- agent-service
- tools
- grounding
- retrieval