How we apply security controls to generative AI systems on AWS

How we apply security controls to generative AI systems on AWS

By the time generative AI reaches production, most teams already have a security stack in place.

IAM, network boundaries, logging, monitoring, encryption, incident response. None of that stops being relevant just because a large language model enters the architecture.

The real work is figuring out where those controls still apply, where they don’t, and where the model simply can’t help you.

When we build generative AI systems on AWS, we don’t start by inventing new security frameworks. We start by mapping the controls we already trust onto this new architecture and being very clear about what the model is not responsible for.

The model is not a security boundary

Foundation models are powerful, but they have no idea who your users are.

They don’t understand roles, data classifications, or document permissions. If sensitive data is included in a prompt, the model has access to all of it. There’s no row-level access control or entitlement check happening inside the model.

That leads to one simple rule:

If access control matters, it has to happen before data reaches the model.

Once teams accept that, the design starts to make more sense.

IAM is still critical, but it’s only part of the picture

On AWS, IAM controls which services and roles can invoke model APIs or access supporting infrastructure. That part doesn’t change.

But IAM mostly handles system-to-system permissions. It doesn’t decide what a particular end user should or shouldn’t see in a response.

That logic lives in the application layer.

In real systems, users are authenticated, mapped to roles or attributes, and only authorised data is ever retrieved or included in prompts. If someone shouldn’t have access to a document, the system should make sure the model never sees it in the first place.

Retrieval is a security control, not just a relevance feature

In retrieval-augmented systems, it’s easy to focus only on search quality and ranking.

From a security perspective, retrieval is one of the strongest enforcement points in the whole system.

Vector databases and search layers need to apply the same access rules as any other data platform. Queries should be filtered based on user entitlements, and only authorised content should come back. The model then works with data that has already passed access control.

This is one reason retrieval-based designs are often safer than pushing large volumes of sensitive data into fine-tuning. Access control stays visible and testable instead of disappearing inside a model.

Prompt injection is a familiar problem in a new place

Prompt injection gets a lot of attention because it sounds like a brand-new risk. In practice, it behaves like an old one.

Untrusted input is being interpreted by a powerful downstream component.

The defensive approach looks very familiar:

  • Treat all user input as untrusted

  • Keep system instructions separate from user-provided content

  • Avoid building prompts that mix instructions and data without clear boundaries

  • Apply limits and validation to what users can submit

The target happens to be a model, but the mindset is still application security.

Network and infrastructure controls still do their job

Generative AI doesn’t change the basics of infrastructure security.

We still use:

  • Network segmentation and private connectivity where possible

  • Encryption in transit and at rest

  • Strong key management and rotation

  • Secure handling of secrets and API credentials

Model invocation is just another dependency from the application’s perspective. It should be protected the same way you would protect any other external service.

Logging and monitoring matter even more

You can’t open up a model and inspect why it produced a specific answer.

That makes the surrounding signals more important.

We focus on things like:

  • Model invocation patterns

  • Token usage and unusual spikes

  • Retrieval queries that look odd or out of scope

  • Application-level errors and fallback behaviour

Those signals are what let you spot misuse, abuse, or unexpected behaviour without needing insight into the model’s internals.

Incident response needs to cover AI-specific scenarios

Your existing incident response process still applies, but the scenarios expand.

You now need to be ready for things like:

  • The model producing unsafe or inappropriate output

  • Sudden spikes in usage that hit quotas or drive unexpected cost

  • A data source feeding the model being compromised

These aren’t theoretical. They’re situations you want runbooks and decision owners for before they happen.

The goal is clarity, not complexity

Adding generative AI to your environment doesn’t mean throwing out your security architecture.

Most of the controls you need are already there. The work is putting them in the right places and being honest about what the model cannot do for you.

When identity, access control, network protection, and monitoring are applied deliberately around the model, generative AI becomes another workload to secure, not a special case that needs a completely new rulebook.

Final thoughts

Security issues in generative AI systems rarely come from the model doing something exotic.

They usually come from treating the model like a trusted component, assuming it can enforce rules it doesn’t understand, and forgetting to apply the controls that already work elsewhere in the stack.

When we deploy generative AI systems on AWS, the biggest shift isn’t new tooling. It’s making sure familiar controls sit in the right layers, so the model only ever sees what it’s meant to.