Introduction

Building a Fort Knox for AI: Securing Vertex AI Endpoints in Regulated EU Environments

Deploying a machine learning model is a significant security risk if not properly isolated. I've learned that we must treat AI endpoints with the same rigorous zero-trust and data-loss-prevention standards as our core databases. With "sovereignty"—meaning full-stack control over cloud services—increasingly becoming a priority in Europe, it adds new, critical requirements for how we should architect our AI solutions.

When I first started building AI inference pipelines, it was easy to get caught up in model performance and throughput. But quickly, I realized that the shared responsibility model in cloud computing extends deeply into machine learning. Deploying an AI model, especially one handling sensitive data or operating in a critical business process, isn't just about API calls; it's about managing a significant attack surface. We're talking about potential data exfiltration through model inputs or outputs, model inversion attacks revealing training data, or even poisoning attacks compromising model integrity. These aren't theoretical risks; they are real threats that demand the same, if not greater, vigilance as our enterprise databases.

This essay will walk through how I secure Vertex AI endpoints to meet stringent regulatory demands, particularly focusing on European sovereignty requirements like France's SecNumCloud. I'll explore Google Cloud's offerings, including its "cloud de confiance" strategy and the role of S3NS, a joint venture designed to address these highly specific needs. My goal is to show you how to build a Fort Knox for your AI, ensuring your models are not only performant but also impenetrable.

Prerequisites

Before diving into hardening our Vertex AI endpoints, I ensure the following are in place:

Google Cloud CLI (gcloud): Authenticated and configured for my project.
GCP Project: A project with billing enabled and the necessary permissions to create network resources, IAM policies, and Vertex AI assets.
Required APIs Enabled: I always enable these upfront to avoid permission errors later on. I replace my-gcp-project-id with my actual project ID.

# Enable Vertex AI API
gcloud services enable aiplatform.googleapis.com \
                     --project="my-gcp-project-id"

# Enable Compute Engine API (for VPC/PSC resources)
gcloud services enable compute.googleapis.com \
                     --project="my-gcp-project-id"

# Enable Cloud KMS API (for Customer-Managed Encryption Keys)
gcloud services enable cloudkms.googleapis.com \
                     --project="my-gcp-project-id"

# Enable Service Consumer Management API (for Private Service Connect)
gcloud services enable serviceconsumermanagement.googleapis.com \
                     --project="my-gcp-project-id"

# Enable Access Context Manager API (for VPC Service Controls)
gcloud services enable accesscontextmanager.googleapis.com \
                     --project="my-gcp-project-id"

Python packages (Vertex + Gemini): pip install google-cloud-aiplatform for MLOps (model upload, private endpoints). pip install google-genai for Gemini generative calls on Vertex AI (import google.genai as genai).
European Region Preference: For all resources, I consistently use European regions. My preferred choices are europe-west1 (Belgium) or europe-west4 (Netherlands) for many deployments, and europe-north1 (Finland) for others. Consistency in region selection is crucial for data residency and minimizing latency.

SDK note (2026): The generative AI modules in the Vertex AI Python SDK (vertexai and the Generative AI APIs under google.cloud.aiplatform) are deprecated with removal scheduled for 24 June 2026 — migrate to google-genai (pip install google-genai, import google.genai as genai) for Gemini and generative workloads. MLOps flows such as Model.upload() and Endpoint.create() for custom prediction endpoints still use google-cloud-aiplatform today; follow Google's migration guide.

Model IDs on Vertex AI: Prefer gemini-2.5-flash or gemini-2.5-pro. gemini-2.0-* variants are restricted for new customers from March 2026 and reach end-of-life 1 June 2026.

Conceptual Implementation Blueprint: While not directly accessible as a public link, a conceptual implementation blueprint for these configurations, demonstrating the Terraform structure and Python SDK interactions discussed, typically lives in a private repository in my own workflow.

Architecture & Concepts

Securing AI endpoints isn't just about a single setting; it's a layered approach. I think of it as concentric rings of defense around critical machine learning assets. Our goal is to isolate traffic, define clear perimeters, encrypt data, and control access with extreme granularity. My focus here is to integrate these security measures into a cohesive architecture for Vertex AI, especially with European sovereignty in mind.

Core Architectural Principles

Network Isolation: Bypassing the public internet entirely for inference traffic is non-negotiable for sensitive workloads. This prevents common network-based attacks and data interception.
Perimeter Security (VPC-SC): Establishing guardrails against data exfiltration and unauthorized cross-project access. This acts as a logical air gap.
Data Protection: Ensuring models and data are encrypted at rest and in transit using customer-managed keys. This gives me direct control over my cryptographic material.
Least Privilege Access: Implementing granular IAM roles, separating the concerns of model deployment from model invocation. This minimizes the blast radius of any compromised credentials.
Model Governance: Implementing practices like model signing (e.g., using Sigstore/Cosign), vulnerability scanning of model dependencies, and comprehensive audit logging for all model lifecycle events. This builds trust and traceability into the AI supply chain.
Sovereignty Compliance: Aligning with regulatory frameworks by leveraging specific cloud offerings and partnerships, ensuring legal and operational control resides within compliant jurisdictions.

Here's how these components fit together for a highly secure Vertex AI deployment:

flowchart TD clientApp["Client Application"] -->|Authenticates via IAM| userVPC["User VPC Network"] userVPC -->|HTTPS| pscEndpoint(PSC Endpoint) pscEndpoint -->|Private Connectivity| vertexAIControlPlane(Vertex AI Control Plane) subgraph vpcPerimeter ["VPC Service Controls Perimeter"] vertexAIControlPlane -->|Private API Call| vertexAIEndpoint(Vertex AI Endpoint) vertexAIEndpoint -->|Serves Inferences| modelArtifacts["Cloud Storage Bucket (CMEK)"] vertexAIControlPlane -->|Manages| modelRegistry(Vertex AI Model Registry) end threatActor(Threat Actor) -.->|Blocked by VPC-SC| vpcPerimeter threatActor -.->|Blocked by Private Endpoint| vertexAIControlPlane classDef default fill:#f8fafc,stroke:#cbd5e1,stroke-width:1px,color:#0f172a classDef physical fill:#e2e8f0,stroke:#94a3b8,stroke-width:2px,color:#0f172a classDef network fill:#dbeafe,stroke:#60a5fa,stroke-width:2px,color:#1e3a8a classDef cloud fill:#ede9fe,stroke:#a78bfa,stroke-width:2px,color:#4c1d95 classDef data fill:#dcfce7,stroke:#4ade80,stroke-width:2px,color:#14532d classDef alert fill:#fef2f2,stroke:#f87171,stroke-width:2px,color:#7f1d1d class clientApp physical class userVPC network class pscEndpoint network class vertexAIControlPlane cloud class vertexAIEndpoint cloud class modelArtifacts cloud class modelRegistry cloud class vpcPerimeter alert class threatActor physical

When it comes to network isolation, I typically choose Private Service Connect (PSC) over VPC Peering for connecting to managed services like Vertex AI. While VPC Peering works, PSC offers a cleaner separation. With PSC, Vertex AI services present themselves as endpoints directly within my VPC, consuming an internal IP address. This removes the need for transitive routing or managing peered networks, simplifying my network architecture and reducing the attack surface. It’s a dedicated, internal network connection, bypassing the public internet completely.

VPC Service Controls (VPC-SC) is my digital Fort Knox wall. It lets me define security perimeters around my GCP projects and the services within them, like Vertex AI and Cloud Storage. Any attempt to move data or access resources from outside this perimeter is blocked, regardless of IAM permissions. This is a critical line of defense against data exfiltration, ensuring that even if an attacker compromises a service account, they cannot extract data outside my defined perimeter. It effectively creates a policy-based air gap for my sensitive services.

For Data at Rest and in Transit, Customer-Managed Encryption Keys (CMEK) are paramount. While Google encrypts data by default, CMEK gives me cryptographic control over the keys that protect my models in Vertex AI Model Registry and their associated artifacts in Cloud Storage. This is a common requirement in regulated industries. For data in transit, Vertex AI private endpoints inherently use TLS (Transport Layer Security), ensuring all communication between my VPC and the Vertex AI service is encrypted and secure.

Finally, Identity & Access Management (IAM) needs to be surgical. I always advocate for separate service accounts for distinct actions. The identity that deploys a model should be different from the service account that invokes the endpoint for inference. This adheres to the principle of least privilege, minimizing the blast radius of any compromised credential. For instance, a model-deployer service account might have aiplatform.admin on models, while an inference-invoker service account would only have aiplatform.user on specific endpoints.

The European Sovereignty Layer: S3NS and SecNumCloud

In Europe, the conversation around data sovereignty and digital trust is intensifying. The concept of "cloud de confiance" is gaining traction, particularly in France with its SecNumCloud certification. But to truly understand what these requirements mean for AI workloads on GCP, it is worth going beyond the name-drops and examining the mechanics of why standard cloud security controls — even hardened ones like those covered in this guide — are not sufficient for organisations operating under French public-sector mandates or strict EU sovereignty requirements.

The Root Problem: Extraterritorial Law

Before we talk about S3NS, we need to understand why it exists. The US CLOUD Act (Clarifying Lawful Overseas Use of Data Act, 2018) means that a US company — including Google — can be compelled to hand over data it controls, even if that data is physically stored in Europe. Standard cloud security controls (VPC Service Controls, CMEK, Private Service Connect) protect you from external threats and misconfigurations, but they do not change the fundamental jurisdictional reality: Google is a US entity, and its operational staff can theoretically be compelled to act under US law. This is what SecNumCloud and S3NS address head-on.

What is SecNumCloud?

SecNumCloud is a qualification framework published by ANSSI (Agence nationale de la sécurité des systèmes d'information), France's national cybersecurity agency. Version 3.2 — the current and operationally significant version — comprises 276 requirements across 15 security domains: operational security, cryptology, access control, network security, business continuity, incident management, and critically, protection against extraterritorial laws. That final domain is what sets SecNumCloud apart from other European security certifications like ISO 27001 or BSI C5. It is not purely a cybersecurity standard; it is simultaneously a jurisdictional control framework. A provider can only receive SecNumCloud qualification if it can demonstrate that foreign governments have no legal or technical mechanism to compel access to customer data.

S3NS: Structure and Corporate Design

S3NS (pronounced sens, French for "sense") is a joint venture structured to be legally immune to US extraterritorial pressure from inception. Key facts:

Majority ownership: Thales holds the majority stake; Google Cloud holds approximately 20%
Legal jurisdiction: S3NS is a French société par actions simplifiée (SAS), legally obligated exclusively under European law
Consequence: Any US government data request directed at S3NS can be — and would be — rejected; there is no legal mechanism through which Google can unilaterally compel S3NS to act

The corporate structure is intentional sovereignty engineering: Google brings the technology, Thales and S3NS bring the European legal wrapper and operational control.

Two Tiers: CRYPT3NS vs. PREMI3NS

S3NS offers two distinct service tiers, and conflating them is a common mistake:

	CRYPT3NS	PREMI3NS
Infrastructure	Standard GCP regions	Dedicated, isolated from Google's public cloud
SecNumCloud qualification	No	Yes (qualified December 17, 2025)
Google personnel access	Standard GCP support model	Zero — Google has no physical or logical access
Suitable for	Enhanced data residency, software-layer sovereignty	Highly regulated workloads: defence, healthcare, government, critical infrastructure
CLOUD Act immunity	Partial (legal structure only)	Full (legal + technical)

For most private-sector EU organisations, CRYPT3NS provides meaningful protection via data residency controls and legal jurisdiction. For organisations under strict French public procurement rules, defence supply chains, or sectors explicitly requiring SecNumCloud qualification, only PREMI3NS meets the bar.

How PREMI3NS Achieves Technical Isolation

This is the core question: how can GCP services run in a data centre over which Google has no control?

The answer is Google Cloud Dedicated — an architecture where Google licenses its technology stack (compute, storage, networking software) to S3NS, which then deploys and operates it entirely on its own physical infrastructure. Several deliberate control mechanisms make this possible:

Physically isolated infrastructure: PREMI3NS runs on its own compute, storage, and networking hardware in French data centres. These systems are entirely separate from Google's global public cloud — there is no shared fabric and no common control plane accessible to Google employees.
Zero Google access: Google personnel have no physical or logical access to the PREMI3NS environment. Operations, administration, and support are performed exclusively by S3NS employees located within the EU. Escalations that would normally reach Google engineering are handled internally by S3NS.
Update quarantine model: When Google releases a software update — kernel patches, service updates, new features — S3NS intercepts it in a quarantine environment. S3NS engineers perform automated reverse-engineering and security analysis on the binaries before deciding whether to deploy them to production. S3NS is not passively running Google's code; it is an active gatekeeper of every software change that reaches its infrastructure.
Sovereign cryptographic control: S3NS manages the cryptographic roots of trust for the PREMI3NS environment. Unlike CMEK on standard GCP, where you manage keys but Google still operates the infrastructure, on PREMI3NS the entire key management hierarchy sits under S3NS and customer control — not Google's.
Dual SOC model: S3NS operates its own Security Operations Centre in coordination with Thales's ANSSI-certified P10 SOC, providing dual monitoring for abnormal behaviour including any attempt to introduce a Google-accessible access path.

Current Service Catalogue and the Vertex AI Roadmap

As of the December 2025 qualification, PREMI3NS covers the core infrastructure for most regulated workloads: Compute Engine, Cloud GPUs (NVIDIA H100s), Cloud Storage, networking (DNS, VPN, Load Balancing, Cloud Armor, Interconnect), BigQuery Enterprise, Pub/Sub, GKE Autopilot, Cloud SQL Enterprise Plus, and operations tooling (Monitoring, Logging). A second wave — Cloud Run, Cloud Build, Cloud Spanner, Cloud Bigtable, Secret Manager, Confidential VMs, and Admin Access Transparency — is pending qualification for H1 2026.

For Vertex AI specifically, core foundations (Model Garden for open-weight models, Model Registry, Workbench, and Online Inference) are on the roadmap for H2 2026. Gemini will not be in the initial Model Garden — only open-source and open-weight models are included initially, which is a material consideration for sovereign AI inference design today. If you need SecNumCloud-qualified inference right now, the practical path is deploying your own models on GKE Autopilot with H100 GPUs on PREMI3NS. Managed Vertex AI on PREMI3NS is imminent, not yet delivered.

When to Use S3NS vs Standard GCP Hardening

The controls covered throughout the rest of this guide (PSC, VPC-SC, CMEK) are highly effective for the vast majority of regulated EU workloads. The following heuristic helps decide which tier is appropriate:

Standard hardened GCP (this guide): sufficient for GDPR, NIS2, ISO 27001, banking/insurance frameworks, and any organisation where operational isolation from US law is not an explicit procurement requirement
CRYPT3NS: appropriate when French legal jurisdiction and data residency guarantees are needed but full infrastructure isolation is not mandated
PREMI3NS: mandatory when SecNumCloud qualification is required — French OIV (Opérateurs d'Importance Vitale), defence supply chain, sensitive government data, or any procurement specification that explicitly references SecNumCloud

The €180 million EU sovereign cloud contract award involving S3NS, and Thales's own adoption of SAP RISE on PREMI3NS, signal that this model has moved from pilot to production at scale.

Implementation Guide

Let's put these concepts into practice. I'll guide you through configuring a secure Vertex AI endpoint in europe-west1.

1. Configure Private Service Connect (PSC) for Vertex AI

First, I establish a private connection to Vertex AI. This involves creating a Private Service Connect endpoint in my VPC that connects to the Vertex AI service producer network. Note that Vertex AI internally provisions its own service attachment; I only need to configure the consumer side in my project.

# main.tf
# Define variables for your project and region
variable "gcp_project_id" {
  description = "The ID of your GCP project."
  type        = string
}

variable "gcp_region" {
  description = "The GCP region for resources (e.g., europe-west1)"
  type        = string
  default     = "europe-west1" # Always prefer European regions
}

variable "vpc_network_name" {
  description = "The name of your existing VPC network."
  type        = string
  default     = "vertex-ai-inference-vpc" # Example VPC name
}

# Configure the GCP provider
provider "google" {
  project = var.gcp_project_id
  region  = var.gcp_region
}

# Create a Private Service Connect endpoint for Vertex AI
# This service attachment name is specific to Vertex AI Prediction
# and is a Google-managed producer endpoint.
resource "google_compute_network_attachment" "vertex_ai_psc_endpoint" {
  name        = "vertex-ai-psc-endpoint"
  project     = var.gcp_project_id
  region      = var.gcp_region
  description = "PSC endpoint for Vertex AI Prediction service."
  network     = "projects/${var.gcp_project_id}/global/networks/${var.vpc_network_name}"

  # The service_attachment_names array must reference the Google-managed producer attachment.
  # For Vertex AI Prediction, the format is 'projects/REGION-aiplatform/regions/REGION/serviceAttachments/aiplatform-producer-REGION'
  # Replace REGION with the specific region you are deploying to (e.g., europe-west1).
  service_attachment_names = ["projects/${var.gcp_region}-aiplatform/regions/${var.gcp_region}/serviceAttachments/aiplatform-producer-${var.gcp_region}"]
}

output "psc_endpoint_link" {
  description = "The self_link of the Private Service Connect network attachment."
  value       = google_compute_network_attachment.vertex_ai_psc_endpoint.self_link
}

# Note: For managed services like Vertex AI, you typically interact via DNS names
# that resolve to the private IP behind the scenes. This PSC endpoint enables that private resolution.

This Terraform configuration sets up the consumer side of PSC. The key is correctly identifying the service_attachment_names for Vertex AI Prediction in your chosen region. Once deployed, any traffic from your vertex-ai-inference-vpc to this endpoint will be routed privately to Vertex AI. While the output psc_endpoint_link provides the resource link, for managed services like Vertex AI, you typically interact via DNS names that resolve to this private IP behind the scenes. Consult the official Private Service Connect documentation for the most up-to-date service attachment names.

2. Implement VPC Service Controls (VPC-SC) Perimeter

Next, I define a security perimeter around my projects and critical services. This example assumes you have an organization defined in GCP, which is a prerequisite for VPC-SC.

# vpc_sc.tf
resource "google_access_context_manager_service_perimeter" "vertex_ai_perimeter" {
  parent = "organizations/YOUR_ORGANIZATION_ID" # IMPORTANT: Replace with your actual organization ID
  name   = "accessPolicies/YOUR_ACCESS_POLICY_ID/servicePerimeters/vertex_ai_sec_perimeter" # IMPORTANT: Replace with your actual Access Policy ID
  title  = "vertex-ai-sec-perimeter"

  perimeter_type = "REGULAR"

  status {
    restricted_services = [
      "aiplatform.googleapis.com",
      "storage.googleapis.com" # Critical for models and artifacts in Cloud Storage
    ]

    # If you have specific access levels defined (e.g., trusted IP ranges or device policies),
    # you would list them here. For this example, we assume basic project inclusion.
    # For creating an access policy and level, refer to Google Cloud's Access Context Manager docs.
    # access_levels = [
    #   "accessPolicies/YOUR_ACCESS_POLICY_ID/accessLevels/trusted_networks_and_devices" # Example access level
    # ]

    resources = [
      "projects/${var.gcp_project_id}" # Your project where Vertex AI is deployed
    ]
  }

  description = "VPC-SC perimeter for Vertex AI and associated storage."
}

This Terraform block creates a VPC-SC perimeter. I include aiplatform.googleapis.com and storage.googleapis.com in restricted_services because my models and datasets often reside in Cloud Storage. The resources field specifies which projects are protected by this perimeter. Remember, VPC-SC policies are evaluated after IAM, so even if an identity has storage.admin permissions, if they are outside the perimeter or violate an access level, their request will be denied. This provides a robust data exfiltration prevention layer.

Note: Setting up access_levels and access_policies requires careful planning at the organization level. For a detailed guide, refer to the VPC Service Controls documentation.

3. Configure Customer-Managed Encryption Keys (CMEK)

Taking control of your encryption keys is a vital step for regulatory compliance. I'll create a Cloud KMS key ring and key, then grant the necessary permissions for Vertex AI to use it.

# Step 3a: Create a KMS Key Ring and Key
# Key rings organize keys; keys perform encryption/decryption.
KMS_KEYRING_NAME="vertex-ai-model-keyring"
KMS_KEY_NAME="vertex-ai-model-key"
KMS_LOCATION="europe-west1" # Match your Vertex AI region for optimal latency
PROJECT_ID="my-gcp-project-id" # Replace with your actual project ID

gcloud kms keyrings create "${KMS_KEYRING_NAME}" \
       --location="${KMS_LOCATION}" \
       --project="${PROJECT_ID}"

gcloud kms keys create "${KMS_KEY_NAME}" \
       --keyring="${KMS_KEYRING_NAME}" \
       --location="${KMS_LOCATION}" \
       --purpose="encryption" \
       --project="${PROJECT_ID}"

# Step 3b: Grant Vertex AI Service Agent permissions to use the KMS Key
# The Vertex AI Service Agent is specific to your project and region.
# Format: service-<PROJECT_NUMBER>@gcp-sa-aiplatform.iam.gserviceaccount.com
# Get your project number:
PROJECT_NUMBER=$(gcloud projects describe ${PROJECT_ID} --format="value(projectNumber)")
VERTEX_AI_SERVICE_AGENT="service-${PROJECT_NUMBER}@gcp-sa-aiplatform.iam.gserviceaccount.com"

KMS_KEY_RESOURCE="projects/${PROJECT_ID}/locations/${KMS_LOCATION}/keyRings/${KMS_KEYRING_NAME}/cryptoKeys/${KMS_KEY_NAME}"

gcloud kms keys add-iam-policy-binding "${KMS_KEY_NAME}" \
       --location="${KMS_LOCATION}" \
       --keyring="${KMS_KEYRING_NAME}" \
       --member="serviceAccount:${VERTEX_AI_SERVICE_AGENT}" \
       --role="roles/cloudkms.cryptoKeyEncrypterDecrypter" \
       --project="${PROJECT_ID}"

echo "KMS Key Resource Name: ${KMS_KEY_RESOURCE}"

Expected Output (example for key creation):

Created keyring [vertex-ai-model-keyring].
Created key [vertex-ai-model-key].
Updated IAM policy for key [vertex-ai-model-key].
KMS Key Resource Name: projects/my-gcp-project-id/locations/europe-west1/keyRings/vertex-ai-model-keyring/cryptoKeys/vertex-ai-model-key

Now, when I upload or register a model with Vertex AI, I specify this KMS_KEY_RESOURCE to ensure my model artifacts are encrypted with my CMEK. This is typically done during the Model.upload() operation in the Vertex AI SDK or via the REST API:

# Python 3.13+ example for uploading a model with CMEK
from google.cloud import aiplatform

PROJECT_ID = "my-gcp-project-id" # Replace with your actual project ID
REGION = "europe-west1"
MODEL_DISPLAY_NAME = "my-secure-model"
MODEL_ARTIFACT_URI = "gs://my-secure-model-bucket/path/to/model/artifacts/" # Replace with your GCS bucket for model artifacts
SERVING_CONTAINER_IMAGE_URI = "europe-west1-docker.pkg.dev/cloud-aiplatform/prediction/tf2-cpu.2-11:latest"
KMS_KEY_RESOURCE = f"projects/{PROJECT_ID}/locations/{REGION}/keyRings/{KMS_KEYRING_NAME}/cryptoKeys/{KMS_KEY_NAME}"

# Initialize the Vertex AI SDK
aiplatform.init(project=PROJECT_ID, location=REGION)

# Upload the model with CMEK
model = aiplatform.Model.upload(
    display_name=MODEL_DISPLAY_NAME,
    artifact_uri=MODEL_ARTIFACT_URI,
    serving_container_image_uri=SERVING_CONTAINER_IMAGE_URI,
    encryption_spec_key_name=KMS_KEY_RESOURCE,
    sync=True
)

print(f"Model uploaded and encrypted with CMEK: {model.resource_name}")

This ensures that your model, when stored in the registry and associated Cloud Storage buckets, is protected by your cryptographic keys, fulfilling a critical sovereignty requirement.

4. Granular IAM Roles for Inference

Implementing least privilege is crucial. I separate the permissions for model deployment from those for model invocation. Here's how I create a dedicated service account for inference and grant it only the necessary permissions.

# Step 4a: Create a dedicated service account for inference
PROJECT_ID="my-gcp-project-id" # Replace with your actual project ID
INFERENCE_SA_NAME="vertex-inference-sa"
INFERENCE_SA_DESCRIPTION="Service account for invoking Vertex AI endpoints."

gcloud iam service-accounts create "${INFERENCE_SA_NAME}" \
       --display-name="${INFERENCE_SA_DESCRIPTION}" \
       --project="${PROJECT_ID}"

INFERENCE_SA_EMAIL="${INFERENCE_SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"

# Step 4b: Grant necessary roles for endpoint invocation
# aiplatform.user allows invoking predictions on endpoints.
# aiplatform.endpointViewer might also be needed to view endpoint details.

gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
       --member="serviceAccount:${INFERENCE_SA_EMAIL}" \
       --role="roles/aiplatform.user" \
       --project="${PROJECT_ID}"

gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
       --member="serviceAccount:${INFERENCE_SA_EMAIL}" \
       --role="roles/aiplatform.endpointViewer" \
       --project="${PROJECT_ID}"

echo "Inference Service Account: ${INFERENCE_SA_EMAIL}"

Expected Output (example):

Created service account [vertex-inference-sa].
Updated policy for project [my-gcp-project-id].
Updated policy for project [my-gcp-project-id].
Inference Service Account: vertex-inference-sa@my-gcp-project-id.iam.gserviceaccount.com

When your client application or other services need to call the Vertex AI endpoint, they should impersonate this vertex-inference-sa service account or use its credentials. This minimizes the risk: if this service account is compromised, it can only invoke predictions and cannot, for example, delete models or modify your Vertex AI setup.

5. Deploy Model to Private Endpoint

Finally, I deploy the CMEK-encrypted model to an endpoint that leverages our private network configuration. I'll define an endpoint resource and deploy the model we uploaded earlier.

# Python 3.13+ example for deploying to a private endpoint
from google.cloud import aiplatform

PROJECT_ID = "my-gcp-project-id" # Replace with your actual project ID
REGION = "europe-west1"
MODEL_RESOURCE_NAME = model.resource_name # From previous step's model upload
ENDPOINT_DISPLAY_NAME = "my-secure-inference-endpoint"

# Initialize the Vertex AI SDK
aiplatform.init(project=PROJECT_ID, location=REGION)

# Define the endpoint configuration
endpoint = aiplatform.Endpoint.create(
    display_name=ENDPOINT_DISPLAY_NAME,
    project=PROJECT_ID,
    location=REGION,
    # network and enable_private_service_connect are key for private endpoint
    # The network must be in the same region as the endpoint. This refers to your VPC.
    network=f"projects/{PROJECT_ID}/global/networks/vertex-ai-inference-vpc", # Use your defined VPC network name
    enable_private_service_connect=True,
    sync=True
)

# Deploy the model to the endpoint
# You might specify machine type and other scaling settings here.
# This example assumes a simple deployment.
endpoint.deploy(
    model=model,
    deployed_model_display_name=f"{MODEL_DISPLAY_NAME}-deployed",
    machine_type="n1-standard-4", # Example machine type
    min_replica_count=1,
    max_replica_count=2,
    sync=True
)

print(f"Model {MODEL_DISPLAY_NAME} deployed to private endpoint: {endpoint.resource_name}")

This deployment ensures that your model is hosted on an endpoint accessible only via your configured Private Service Connect link, operating within the VPC Service Controls perimeter, and using your customer-managed encryption keys.

Complete Example Repository: You can find more official examples of secure Vertex AI patterns at github.com/GoogleCloudPlatform/vertex-ai-samples/tree/main/community-content/vertex_security_examples.

Additional Code Examples: For advanced deployment patterns and CI/CD integration for Vertex AI, a repository with my more complex implementations typically lives in my private workflow.

Troubleshooting & Verification

Securing an environment can be complex, and verification is key. Here's how I confirm everything is working as expected and troubleshoot common issues.

Verification Commands

To verify that your endpoint is private and secured, you should attempt to invoke it from within your VPC network, using the service account with restricted permissions.

# Python 3.13+ — verify a private Vertex AI setup (custom endpoint vs Gemini)

from google.oauth2 import service_account

PROJECT_ID = "my-gcp-project-id"  # Replace with your actual project ID
REGION = "europe-west1"
# Replace with your endpoint's full resource name
ENDPOINT_RESOURCE_NAME = "projects/123456789012/locations/europe-west1/endpoints/1234567890"
INFERENCE_SA_KEY_PATH = "/path/to/your/vertex-inference-sa-key.json"

credentials = service_account.Credentials.from_service_account_file(INFERENCE_SA_KEY_PATH)

# --- Custom / prediction endpoint (MLOps): Vertex AI SDK ---
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=REGION, credentials=credentials)
endpoint = aiplatform.Endpoint(ENDPOINT_RESOURCE_NAME)
instances = [
    {"feature_1": 1.0, "feature_2": "value"},
    {"feature_1": 2.5, "feature_2": "another_value"},
]
try:
    prediction = endpoint.predict(instances=instances)
    print("Prediction successful:")
    print(prediction.predictions)
except Exception as e:
    print(f"Prediction failed: {e}")
    print("Verify network, IAM, and VPC-SC configurations.")

# --- Gemini on Vertex (generative): Google Gen AI SDK (replaces deprecated vertexai generative module) ---
import google.genai as genai

client = genai.Client(vertexai=True, project=PROJECT_ID, location=REGION)
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Your prompt here",
)
print(response.text)

Expected Output:

Prediction successful:
[model output data]

If you attempt to call this endpoint from outside your VPC (e.g., from your local machine without VPN/Cloud Interconnect to your VPC), or without using the vertex-inference-sa credentials, you should expect a connection error or permission denied message, which confirms your security controls are active.

Common Errors & Solutions

Error: VPC Service Controls Violation

Request is prohibited by organization's policy.

**Solution:** This typically means a request (e.g., trying to access a bucket outside the perimeter or from a non-compliant IP) is blocked by VPC-SC. Double-check your `google_access_context_manager_service_perimeter` configuration. Ensure your client's IP range or access level is correctly defined within the access policy if the request is legitimate. Also, verify that all services your operation relies on (like Cloud Storage for model artifacts) are included in `restricted_services` and that the project is listed in `resources` within the perimeter. For specific details on access levels, refer to the [Google Cloud Access Context Manager documentation](https://cloud.google.com/access-context-manager/docs/overview).

Error: IAM Permission Denied

403 Permission denied on resource projects/PROJECT_NUMBER/locations/REGION/endpoints/ENDPOINT_ID

**Solution:** The service account or user making the request lacks the necessary `aiplatform.user` role (or `aiplatform.endpointViewer` if you're just trying to view endpoint details) on the Vertex AI endpoint or project. Verify the IAM policy bindings for your inference service account using `gcloud iam service-accounts get-iam-policy INFERENCE_SA_EMAIL` and `gcloud projects get-iam-policy my-gcp-project-id`. Ensure the correct roles are granted at the appropriate scope.

Error: Private Service Connect Connectivity Issue

Could not connect to the endpoint. Check network configuration.

**Solution:** This implies an issue with the private network path. Verify that your `google_compute_network_attachment` is provisioned correctly and its status is `ACCEPTED`. Check firewall rules in your VPC to ensure traffic from your client to the PSC endpoint's IP range is allowed. Also, confirm that the DNS resolution for the Vertex AI endpoint is resolving to an internal IP within your VPC, not a public one. If using `gcloud`, ensure your client is running from a VM within the same VPC or connected via VPN/Interconnect.

Conclusion & Next Steps

Securing Vertex AI endpoints in regulated environments is a multi-faceted challenge, but with Google Cloud's capabilities, it's entirely achievable. By implementing a layered defense strategy encompassing Private Service Connect for network isolation, VPC Service Controls for perimeter security, CMEK for data-at-rest encryption, and granular IAM for least-privilege access, we can build robust, compliant, and highly secure AI inference systems. The journey towards digital sovereignty in Europe is evolving, and solutions like S3NS, with its blend of hyperscaler innovation and local operational control, are key enablers for organizations navigating this complex landscape.

My experience building these types of systems tells me that proactive security design is far more effective than reactive patching. Treat your AI endpoints like the critical infrastructure they are, and you'll build trust and resilience into your machine learning operations from day one. There's no compromise when it comes to data security and regulatory compliance in today's cloud-native world.

The Sovereignty Imperative

When designing for European sovereignty, remember that it's not just about data residency. It's increasingly about operational control and legal jurisdiction. S3NS, as a joint venture with Thales, aims to provide that additional layer of trust and control necessary for highly regulated sectors. It's a strategic move to blend the scalability of hyperscalers with the assurance of local control.

Key Takeaways:

Zero-Trust for AI: Apply network isolation, perimeter security, and least privilege to AI endpoints as rigorously as for core databases.
Private Service Connect: The preferred method for private, secure connectivity to Vertex AI, simplifying network architecture over VPC Peering.
VPC Service Controls: Essential for data exfiltration prevention and establishing strong perimeters around Vertex AI and associated services.
CMEK: Gives you cryptographic control over model data, a critical compliance requirement, and addresses data-at-rest encryption.
Sovereignty Solutions: Understand offerings like S3NS that address specific European regulatory needs like SecNumCloud.

Repository Resources:

Official Google Cloud Samples: You can explore more official examples at github.com/GoogleCloudPlatform.
Terraform for GCP: For further infrastructure automation, refer to HashiCorp's Terraform documentation for Google Cloud.

Next Steps:

Review your organization's Access Context Manager policies: Ensure they align with your security posture for AI workloads, especially concerning data ingress/egress. This is fundamental for robust perimeter security.
Evaluate S3NS: For organizations operating under strict European sovereignty mandates, investigate the PREMI3NS by S3NS offering from Google Cloud and Thales. Consider how its operational controls and legal guarantees meet your specific compliance needs.
Implement CI/CD for secure deployments: Automate the deployment of models and endpoints with integrated security checks, ensuring consistent application of these patterns. Consider using Cloud Build or GitLab CI/CD with gcloud and Terraform for automated deployments. A rough conversion rate I use is 1 USD ≈ 0.92 EUR.

Fort Knox for AI: Securing Vertex AI Endpoints in Regulated Environments

Mark