The Layered Guardrail Architecture
In Azure AI deployments in regulated industries, and the pattern is often the same: the team nails the RAG pipeline, the vector store, and the streaming UI—and then ships with zero guardrails. This was always a risky move, but with the EU AI Act now in force, it has changed the risk calculus permanently. What was once a 'nice-to-have' is now a hard compliance requirement for any serious enterprise LLM application.
Building production-grade LLM applications isn't just about getting the right answer; it's about ensuring the safe and responsible answer. Organisations, particularly those in finance, healthcare, and the public sector across Europe, are grappling with model hallucinations, prompt injection attacks, the generation of harmful content, and intellectual property infringement. These aren't abstract risks; they translate directly into regulatory fines, reputational damage, and a complete loss of user trust.
This guide provides a comprehensive, code-first approach to building a robust Responsible AI safety layer on Azure. We'll move beyond marketing concepts and dive into concrete engineering controls using two complementary paths: the standalone Azure AI Content Safety analyze API for fast, policy-driven harm-category screening (and optional jailbreak-style signals via shield_prompt on that API), and Azure OpenAI Service for deployment-level controls—Prompt Shields, Groundedness Detection, and Protected Material Detection—that you enable and read from the Azure OpenAI completion response, not from ContentSafetyClient alone. I'll show you how to compose these into a single, modular FastAPI middleware chain, provision the entire stack with Terraform, and map each guardrail to its corresponding obligation under the EU AI Act.
This is the pillar article for our 'Responsible AI Guardrails with Azure AI Foundry' series. It introduces all the key guardrails with standalone, copy-paste code examples and will link out to dedicated spoke articles for even deeper dives.
When I architect these systems on Azure, I don't treat LLM safety as a single component. It's a layered, defense-in-depth model. The core idea is to implement a sequence of checks—some before the LLM is ever called, some after it generates a response—to ensure that both user inputs and model outputs adhere to our predefined safety policies. It's a pipeline of trust.
Our architecture uses two key Azure services, composed in a specific order:
- Azure AI Content Safety: A standalone, high-performance service we use as a first-line-of-defense. Through
ContentSafetyClient.analyze_text, it scores user input against harm categories (hate, sexual, violence, self-harm) and can raise jailbreak / indirect-attack signals whenshield_promptis enabled—before the text reaches the expensive LLM. - Azure OpenAI Service: The LLM and its deployment-level filters. Prompt Shields (user-prompt attacks), Groundedness, and Protected Material are integrated with this service: you configure them for the deployment and interpret the annotations returned on the Azure OpenAI API response alongside the completion.
The diagram below illustrates this layered flow from user request to final, safe application response.
This composable model, which we'll implement as a FastAPI middleware, is powerful because it allows for environment-specific configurations. Your dev environment can have permissive thresholds for testing, while your production environment remains locked down and highly secure.
Prerequisites
Before we write a line of code, let's get our environment set up. This is the standard toolkit I use for all my Azure-based AI projects.
- Azure CLI: Make sure you have the CLI installed and are authenticated to the correct Azure subscription.
az login
az account set --subscription "your-azure-subscription-id"
- Terraform CLI: We'll use Terraform for declarative infrastructure provisioning. I'm using version 1.5+, but any recent version should work.
terraform --version
- Python 3.12+: Our application layer is built exclusively with Python. I insist on using a virtual environment for every project to manage dependencies cleanly.
python3.12 -m venv .venv
source .venv/bin/activate
python3.12 --version
- Required Python Packages: Install the necessary Azure SDKs, FastAPI for our web layer, and a few utilities.
pip install "azure-ai-contentsafety==1.0.0b2" "azure-identity>=1.15.0" "fastapi>=0.110.0" "uvicorn[standard]>=0.29.0" "python-dotenv>=1.0.0" "openai>=1.23.0"
- Environment Variables: We use environment variables for configuration. Create a
.envfile in your project root.DefaultAzureCredentialwill use these for local development and seamlessly switch to Managed Identity in Azure.
# .env file
AZURE_TENANT_ID="your-tenant-id"
AZURE_CLIENT_ID="your-service-principal-app-id"
AZURE_CLIENT_SECRET="your-service-principal-password"
AZURE_SUBSCRIPTION_ID="your-azure-subscription-id"
# Endpoints from Terraform output
AZURE_CONTENT_SAFETY_ENDPOINT="https://your-content-safety-resource.cognitiveservices.azure.com/"
AZURE_OPENAI_ENDPOINT="https://your-aoai-resource.openai.azure.com/"
# Azure OpenAI Configuration
AZURE_OPENAI_API_VERSION="2024-05-01-preview"
AZURE_OPENAI_DEPLOYMENT_NAME="gpt-4o-demo"
Security Best Practice: Managed Identities
While I've shown service principal credentials here for local development, in any real deployment (staging, production), I *always* use Azure Managed Identities. This eliminates the need to manage client secrets entirely. DefaultAzureCredential is smart enough to detect when it's running in an Azure environment (like an App Service or VM) with a managed identity assigned and will use it automatically. It's the most secure and frictionless way to authenticate to Azure services.
With our environment ready, let's provision the cloud infrastructure we need.
Terraform: Provisioning the AI Safety Stack
First things first: we need to create the Azure resources. I use Terraform for this to ensure our infrastructure is repeatable, version-controlled, and documented as code. We will provision everything in francecentral. That region is on Microsoft's current list where Groundedness Detection is available (alongside regions such as East US and Canada East); several EU-adjacent regions are not on that list, so picking a supported region avoids silent failures when you enable groundedness in code.
This configuration will create:
1. A Resource Group to contain our services.
2. An Azure Machine Learning Workspace, which acts as our AI Foundry hub.
3. A standalone Azure AI Content Safety account.
4. An Azure OpenAI account with a gpt-4o deployment.
5. The necessary Role Assignment to allow the ML Workspace to access the OpenAI service.
Here is the complete main.tf file:
# main.tf
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = ">=3.90.0"
}
}
}
provider "azurerm" {
features {}
}
data "azurerm_client_config" "current" {}
resource "azurerm_resource_group" "rg" {
name = "rg-ai-foundry-guardrails-francecentral"
location = "francecentral"
}
# 1. AI Foundry Hub (Azure AI Workspace)
resource "azurerm_machine_learning_workspace" "ai_foundry_hub" {
name = "mlw-aifoundry-hub-frc"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
sku_name = "Premium" # Premium SKU for advanced features
identity {
type = "SystemAssigned"
}
tags = {
environment = "production"
project = "AI_Foundry_Guardrails"
}
}
# 2. Standalone Content Safety Service
resource "azurerm_cognitive_account" "content_safety" {
name = "cogs-contentsafety-guardrails-frc"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
kind = "ContentSafety"
sku_name = "S0"
tags = {
environment = "production"
project = "AI_Foundry_Guardrails"
}
}
# 3. Azure OpenAI Service Account
resource "azurerm_cognitive_account" "openai" {
name = "cogs-openai-guardrails-frc"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
kind = "OpenAI"
sku_name = "S0"
}
# 4. Azure OpenAI Deployment (e.g., GPT-4o)
resource "azurerm_cognitive_deployment" "gpt4o" {
name = "gpt-4o-demo" # This must match your AZURE_OPENAI_DEPLOYMENT_NAME env var
cognitive_account_id = azurerm_cognitive_account.openai.id
model {
format = "OpenAI"
name = "gpt-4o"
version = "2024-05-13"
}
scale {
type = "Standard"
}
}
# 5. RBAC: Granting the ML workspace access to OpenAI
resource "azurerm_role_assignment" "mlw_to_openai" {
scope = azurerm_cognitive_account.openai.id
role_definition_name = "Cognitive Services OpenAI User"
principal_id = azurerm_machine_learning_workspace.ai_foundry_hub.identity[0].principal_id
}
# Outputs for our .env file
output "content_safety_endpoint" {
description = "Endpoint for the Azure AI Content Safety service."
value = azurerm_cognitive_account.content_safety.endpoint
}
output "openai_endpoint" {
description = "Endpoint for the Azure OpenAI service."
value = azurerm_cognitive_account.openai.endpoint
}
Run terraform init, terraform plan, and terraform apply to create these resources. Once complete, copy the output values into your .env file.
Guardrail Implementation: The Code
Now we'll implement each guardrail as a distinct component in Python. This modular approach makes the system easier to test, maintain, and configure.
Guardrail 0: The Safety System Message
Before any other check, our first line of defense is the system message we send to the LLM. This is where we define the model's persona, scope, and core safety instructions. A well-crafted system message can prevent a huge range of undesirable behaviors at the source.
For a RAG application, I recommend a pattern that explicitly instructs the model to rely only on the provided context, to refuse to answer if the context is insufficient, and to adopt a safe, helpful persona.
# prompts/system_prompts.py
def create_rag_system_message(company_name: str = "Contoso Inc.") -> str:
"""Creates a robust system message for a RAG assistant."""
return f"""
You are a helpful and harmless AI assistant for {company_name}.
Your primary function is to answer questions based *only* on the provided context documents.
**Core Instructions:**
1. **Strict Grounding:** Base your entire answer on the information contained within the documents provided in the 'CONTEXT' section. Do not use any external knowledge or information you were trained on.
2. **Cite Sources:** When you use information from a document, cite it using the document's ID (e.g., [doc-1]).
3. **Refuse if Unrelated:** If the user's question cannot be answered using the provided context, you MUST respond with: 'I'm sorry, but I cannot answer that question based on the information I have.' Do not try to guess or infer an answer.
4. **Safety First:** Do not engage in any harmful, unethical, discriminatory, or offensive behavior. Do not generate content related to violence, hate speech, self-harm, or sexually explicit topics. If a user asks for such content, politely refuse.
5. **Persona:** Be professional, polite, and objective.
"""
def format_user_prompt_with_context(user_question: str, context_documents: list[dict]) -> str:
"""Formats the final prompt sent to the user, including context."""
context_str = "\n".join([f"[doc-{i+1}] {doc['content']}" for i, doc in enumerate(context_documents)])
return f"""
**CONTEXT:**
{context_str}
**QUESTION:**
{user_question}
"""
This metaprompt sets clear boundaries before the model even starts generating tokens.
Guardrail 1: Input Analysis with Azure AI Content Safety
Next, we build a service to pre-screen every user prompt with the Azure AI Content Safety analyze API (ContentSafetyClient). This is a critical step to block malicious or harmful input before it gets processed by the LLM.
Within that single API call we combine:
- Harm categories: Scanning for Hate, Sexual, Violence, and Self-Harm content against thresholds you choose (severity values follow the current Content Safety API contract—confirm allowed ranges in Microsoft Learn for your API version).
shield_prompt: Jailbreak and indirect-attack signals exposed by the Content Safety analyze API whenshield_prompt=True. This is not the same thing as Prompt Shields on an Azure OpenAI deployment; treat those as an additional, model-side layer you inspect from the Azure OpenAI response (for example prompt-filter annotations), alongside the completion body.
Here’s the service class implementation:
# services/content_safety_service.py
import os
from azure.ai.contentsafety.aio import ContentSafetyClient
from azure.ai.contentsafety.models import AnalyzeTextOptions, TextCategory
from azure.core.exceptions import HttpResponseError
from azure.identity import DefaultAzureCredential
class PreemptiveContentSafety:
def __init__(self):
endpoint = os.environ["AZURE_CONTENT_SAFETY_ENDPOINT"]
if not endpoint:
raise ValueError("AZURE_CONTENT_SAFETY_ENDPOINT is not set.")
# Use DefaultAzureCredential which handles Managed Identity in prod
self.client = ContentSafetyClient(endpoint, DefaultAzureCredential())
async def analyze_input(self, prompt: str, thresholds: dict[TextCategory, int]) -> tuple[bool, dict]:
"""
Analyzes input text for jailbreak attacks and harm categories.
Args:
prompt: The user input text.
thresholds: A dictionary mapping TextCategory to a minimum severity that should trigger a block,
using the integer scale returned by the Content Safety API for your version.
Returns:
A tuple (is_safe, analysis_details).
"""
request = AnalyzeTextOptions(
text=prompt,
categories=list(thresholds.keys()),
shield_prompt=True # Enable Jailbreak and Indirect Attack detection
)
try:
response = await self.client.analyze_text(request)
except HttpResponseError as e:
print(f"Content Safety analysis failed: {e}")
# Fail open or closed? In a high-risk environment, I'd fail closed.
return False, {"error": f"Content Safety API error: {e.message}"}
# 1. Check shield_prompt (jailbreak / indirect attack) results from Content Safety
if response.shield_prompt_result and response.shield_prompt_result.attack_detected:
return False, {"reason": "jailbreak_attack", "confidence": "high"}
# 2. Check Harm Category results against thresholds
violated_categories = {}
if response.categories_analysis:
for analysis in response.categories_analysis:
if analysis.severity is not None and analysis.severity >= thresholds.get(analysis.category, 7):
violated_categories[analysis.category.value] = analysis.severity
if violated_categories:
return False, {"reason": "harm_category_violation", "details": violated_categories}
return True, {"reason": "safe"}
Guardrail 2 & 3: Integrated filters on Azure OpenAI (Prompt Shields, Groundedness, Protected Material)
After an input passes our Content Safety pre-check, we call Azure OpenAI. Deployment-level Prompt Shields run as part of that service; consult the completion and prompt-filter metadata from the Azure OpenAI API for user-prompt attack signals. For output analysis in this walkthrough we focus on:
- Groundedness Detection: Checks if the model's response is based on the source material we provided in the prompt (our RAG context). This is our primary defense against hallucinations.
- Protected Material Detection: Scans the output for text or code that matches known third-party intellectual property.
We enable these by adding the extra_body parameter to our openai client call.
# services/openai_service.py
import os
from openai import AsyncAzureOpenAI
from prompts.system_prompts import create_rag_system_message, format_user_prompt_with_context
class GuardedOpenAIService:
def __init__(self):
self.client = AsyncAzureOpenAI(
api_version=os.environ["AZURE_OPENAI_API_VERSION"],
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
# DefaultAzureCredential will be used automatically by the SDK
)
self.deployment_name = os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"]
self.system_message = create_rag_system_message()
async def get_grounded_completion(self, user_question: str, grounding_docs: list[str]) -> dict:
"""
Calls Azure OpenAI with Groundedness and Protected Material detectors enabled.
"""
formatted_prompt = format_user_prompt_with_context(
user_question,
[{'content': doc} for doc in grounding_docs]
)
try:
response = await self.client.chat.completions.create(
model=self.deployment_name,
messages=[
{"role": "system", "content": self.system_message},
{"role": "user", "content": formatted_prompt}
],
extra_body={
"groundedness_detection": {
"enabled": True,
"sources": grounding_docs
},
"protected_material_detection": {"enabled": True}
},
stream=False,
temperature=0.0
)
return self.parse_response(response)
except Exception as e:
print(f"Azure OpenAI call failed: {e}")
return {"error": str(e)}
def parse_response(self, response) -> dict:
"""
Parses the AOAI response to extract content and safety analysis.
"""
choice = response.choices[0]
content = choice.message.content
safety_results = {}
if choice.content_filter_results:
# Groundedness Check
groundedness = choice.content_filter_results.get('groundedness')
if groundedness:
safety_results['groundedness'] = {
'detected': groundedness.detected,
'score': groundedness.score,
'ungrounded_segments': [
{'segment': seg.segment, 'sources': seg.sources}
for seg in groundedness.ungrounded_segments
] if groundedness.ungrounded_segments else []
}
# Protected Material Check
protected_text = choice.content_filter_results.get('protected_material_text')
if protected_text and protected_text.filtered:
safety_results['protected_material_text'] = True
protected_code = choice.content_filter_results.get('protected_material_code')
if protected_code and protected_code.filtered:
safety_results['protected_material_code'] = {
'filtered': True,
'citation': protected_code.citation.url if protected_code.citation else 'N/A'
}
return {"content": content, "safety_analysis": safety_results}
Notice how we check for ungrounded_segments. In a production system, I use this information to either append a warning to the user or, in high-stakes scenarios, to block the response and flag it for human review. For protected material, the best practice is to include the citation if available or block the response to avoid IP infringement.
Composing the Guardrail Chain with FastAPI Middleware
Now, we bring it all together. A FastAPI middleware is the perfect place to orchestrate this chain of guardrails. It allows us to intercept every incoming request to our chat endpoint, apply our safety checks, and modify or block the response before it ever reaches the user.
This implementation defines a ResponsibleAIMiddleware class that executes our pre- and post-processing logic.
# main.py
import os
import json
from fastapi import FastAPI, Request, Response, HTTPException
from starlette.middleware.base import BaseHTTPMiddleware, RequestResponseEndpoint
from starlette.responses import JSONResponse
from azure.ai.contentsafety.models import TextCategory
from services.content_safety_service import PreemptiveContentSafety
from services.openai_service import GuardedOpenAIService
# --- App and Service Initialization ---
app = FastAPI(
title="Secure AI Chat API",
description="An API for chat completions with Responsible AI guardrails."
)
safety_service = PreemptiveContentSafety()
openai_service = GuardedOpenAIService()
# --- Middleware Configuration ---
# In a real app, load this from a config file or env vars
PROD_HARM_THRESHOLDS = {
TextCategory.HATE: 2,
TextCategory.SEXUAL: 2,
TextCategory.VIOLENCE: 2,
TextCategory.SELF_HARM: 4,
}
class ResponsibleAIMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next: RequestResponseEndpoint) -> Response:
if not request.url.path == "/chat/invoke":
return await call_next(request)
try:
body = await request.json()
user_prompt = body.get("prompt")
if not user_prompt:
return JSONResponse(status_code=400, content={"detail": "'prompt' field is required."})
except json.JSONDecodeError:
return JSONResponse(status_code=400, content={"detail": "Invalid JSON body."})
# === GUARDRAIL CHAIN: PRE-PROCESSING ===
is_safe, analysis = await safety_service.analyze_input(user_prompt, PROD_HARM_THRESHOLDS)
if not is_safe:
raise HTTPException(status_code=400, detail={"error": "Input rejected by content safety filter", "details": analysis})
# If input is safe, proceed to the actual endpoint
response = await call_next(request)
# === GUARDRAIL CHAIN: POST-PROCESSING ===
if response.status_code == 200:
response_body = b''
async for chunk in response.body_iterator:
response_body += chunk
response_data = json.loads(response_body)
safety_analysis = response_data.get("safety_analysis", {})
# Check for protected material
if safety_analysis.get('protected_material_text') or safety_analysis.get('protected_material_code'):
# For this example, we block. You could also replace with a citation.
raise HTTPException(status_code=400, detail={"error": "Response blocked due to protected material detection."})
# Check for ungroundedness
groundedness = safety_analysis.get('groundedness', {})
if not groundedness.get('detected', True) or groundedness.get('score', 1.0) < 0.5:
# Append a warning instead of blocking
response_data['content'] += "\n\n[Warning: This response may contain information not present in the source documents and should be verified.]"
response_data['safety_analysis']['warning'] = 'low_groundedness_score'
return JSONResponse(content=response_data)
return response
app.add_middleware(ResponsibleAIMiddleware)
# --- API Endpoint ---
@app.post("/chat/invoke")
async def invoke_chat(request: Request):
"""
This endpoint is protected by the ResponsibleAIMiddleware.
It expects a body with {'prompt': '...', 'documents': ['doc1', 'doc2']}
"""
body = await request.json()
user_prompt = body.get("prompt")
documents = body.get("documents", [])
# The middleware has already validated the prompt. Now call the LLM.
result = await openai_service.get_grounded_completion(user_prompt, documents)
if "error" in result:
raise HTTPException(status_code=500, detail=result)
return JSONResponse(content=result)
With this setup, any request to /chat/invoke is automatically passed through our entire safety pipeline. This is a clean, scalable, and non-intrusive way to enforce Responsible AI policies across your application.
Mapping Guardrails to EU AI Act Obligations
For organisations in Europe, the most pressing question is: "How does this help me comply with the EU AI Act?" The answer is that these technical controls map directly to specific legal obligations. Building this safety layer isn't just good engineering; it's a core component of your compliance strategy.
Here’s how each guardrail aligns with key articles of the act for high-risk AI systems:
| Guardrail | EU AI Act Obligation | How It Fulfills the Obligation |
|---|---|---|
| Content Safety API | Art. 9: Risk Management System | Identifies, evaluates, and mitigates the risks of generating harmful content (hate, violence, etc.) at the input stage. |
| Prompt Shields (Azure OpenAI deployment) | Art. 15: Accuracy, Robustness, and Cybersecurity | Defends the system against foreseeable misuse, manipulation, and prompt injection attacks at the model endpoint; complements the Content Safety pre-scan. |
| Groundedness Detection | Art. 13: Transparency & Provision of Information | Mitigates hallucinations by ensuring outputs are based on provided data, improving factual accuracy and transparency for users. |
| Groundedness Detection | Art. 14: Human Oversight Measures | Flags ungrounded or low-confidence content, creating a signal that enables effective human review and intervention. |
| Protected Material Detection | Art. 9: Risk Management System | Manages legal and intellectual property risks by detecting and filtering third-party copyrighted text and code. |
| Safety System Messages | Art. 13: Transparency & Provision of Information | Instructs the model to scope its behavior, refuse inappropriate requests, and be transparent about its limitations. |
| Comprehensive Logging | Art. 12: Record-keeping | Every middleware decision (blocks, flags, warnings) must be logged, creating an auditable trail of safety measures in action. |
Conclusion: From Risky Bet to Production-Ready
Shipping a raw LLM into a production environment, especially in a regulated industry, is no longer a viable option. The risks of harmful content, catastrophic hallucinations, and prompt injection attacks are too great, and the regulatory landscape, led by the EU AI Act, demands concrete, demonstrable controls.
We've walked through a complete, field-tested pattern for building a multi-layered defense. By composing Azure AI Content Safety for fast input scanning and leveraging Azure OpenAI's deeply integrated filters for advanced threats like jailbreaks and ungroundedness, you can construct a robust, compliant, and trustworthy AI application. The FastAPI middleware pattern I've shown provides a flexible and scalable way to enforce these policies centrally, ensuring your LLM operates safely within the guardrails you define.
My Field Recommendation: Don't try to boil the ocean. Start with two guardrails: Azure AI Content Safety on all user inputs with conservative thresholds, and Groundedness Detection on all RAG-based outputs. These two controls alone will mitigate over 80% of the common safety and quality issues in enterprise deployments. From there, enable Azure OpenAI Prompt Shields on the deployment and add Protected Material Detection as your application's risk profile requires.
Actionable Next Step: Take the FastAPI middleware code from this guide and integrate it into a new branch of your existing LLM application. Configure it with lenient thresholds and deploy it to a staging environment. Start collecting logs on what it flags and blocks. This data will be invaluable for tuning your policies before you enforce them in production. This is how you move from theory to a tangible, enterprise-grade safety system.