Blog

Thoughts, updates, and insights from the Superagent team.

All Security Research Announcements Benchmarks Compliance Red Teaming Guardrails Engineering Opinions Category

Red Teaming•December 9, 2025•5 min read

Red Teaming AI Agents: What We Learned From 50 Assessments

After red teaming 50 AI agents across different companies, industries, and setups, we've identified critical patterns that teams need to understand. Here's what actually matters when securing AI agents in production.

Benchmarks•December 3, 2025•3 min read

Open Source AI Models: A Safety Score Reality Check

The open source AI movement has democratized access to powerful language models, enabling developers and organizations to deploy sophisticated AI systems without vendor lock-in or prohibitive costs.

Security•December 1, 2025•4 min read

Your System Prompt Is the First Thing Attackers Probe

When attackers target AI agents, they don't start with sophisticated exploits. They start by probing the system prompt—the instructions that define your agent's behavior, tools, and boundaries.

Guardrails•November 24, 2025•2 min read

Your RAG Pipeline Is One Prompt Away From a Jailbreak

RAG is marketed as a safety feature, but connect it to agents that browse, call APIs, or touch databases, and every document becomes a potential jailbreak payload. Learn how malicious files, knowledge base poisoning, and indirect prompt injection turn RAG into an attack surface—and how to defend against it.

Security•November 20, 2025•5 min read

Practical guide to building safe & secure AI agents

System prompts aren't enough to secure AI agents. As agents move from chatbots to systems that read files, hit APIs, and touch production, we need real runtime protection. Learn how to defend against prompt injection, poisoned tool results, and the 'lethal trifecta' with practical guardrails.

Research•November 19, 2025•2 min read

AI Is Getting Better at Everything—Including Being Exploited

As AI models become more capable and obedient, safety improvements struggle to keep pace. The GPT-5.1 safety score drop reveals a structural problem: capability and attack surface scale faster than safety.

1 2 3 4 5 6 7 8

Join our newsletter

Updates on securing code and agents, vulnerability research, and product news.