Protecting Agents from AI Cyber Espionage

A deep dive into prompt injection attacks and guardrail strategies for securing LLM-powered agents in production.

AI Security · AWS Bedrock · Blog

The challenge

After Anthropic published their GTG-1002 report — the first documented AI-orchestrated cyber espionage campaign — the question shifted from "what if" to "how do we stop this now?" Agents that call tools, read internal data, and run workflows are the new attack surface.

Approach

Pattern 1: Make prompt injection harder. Use denied topics, word filters, and prompt-attack detection on Bedrock Guardrails to block instruction-breaking prompts before they reach the model.
Pattern 2: Clean inputs and outputs. Detect poisoned context in RAG documents and chat history on the way in. Catch PII, secrets, and cross-tenant data leaks on the way out.
Pattern 3: Lock down tool-calling agents. Stricter guardrails on agents with API access — denied topics around destructive verbs combined with critical system names, plus logging guardrail decisions alongside tool calls.
Wired into a Strands SDK agent via Bedrock's guardrail_id and trace config — every request and response evaluated against the policy, with full trace for debugging and audits.

Key takeaway

Guardrails sit beside your system prompt as a policy firewall: the system prompt defines what the agent should do, guardrails define what it must never do. Published on the NewMathData blog, referenced OWASP GenAI Security Project and Anthropic's threat research.

View the project →