The Ghost in the Machine: What 20 Years in Infosec Taught Me About AI Agent Security
- May 19, 2026
- Shawn Evans
In my twenty years in information security, I’ve witnessed the rise and fall of countless technologies. I’ve seen “silver bullets” come and go, but I have never encountered a technology that evolves with the blistering pace of Large Language Models (LLMs). We are no longer watching a standard software lifecycle; we are watching a weekly metamorphosis.
At NopSec, we have performed extensive security research and assessments against complex AI agent architectures utilizing Model Context Protocol (MCP) tools—including our own in-house solutions. What we observed was a sobering reminder that while the technology is cutting-edge, the vulnerabilities remain classic: over-trust, permissive defaults, and the dangerous fallacy of treating a language model as a security firewall.
When assessing AI agents, we cannot look at the LLM in isolation. We must evaluate the Agentic Workflow—the ecosystem of “tools” and “servers” that allow the AI to interact with the real world (databases, file systems, and APIs).
Our assessment methodology focuses on the “Handshake” between the AI and its tools:
The most striking discovery in our recent research wasn’t a technical bug, but a behavioral one inherent to many modern LLMs: Sycophancy. Most models are fine-tuned to be as helpful as possible. While this makes for a great user experience, it creates a massive security loophole.
We found that even with robust “system prompts” designed to prevent malicious activity, these guardrails are often “soft” constraints. By adopting the persona of an authorized but “stressed” employee, or framing a malicious request as a “critical system fix,” we were able to convince the AI to bypass its own safety training. Once “convinced,” the AI acted as an unauthenticated proxy, executing privileged commands and exfiltrating data—all while believing it was simply being helpful.
This emergent industry currently maintains a tenuous over-reliance on the LLM as a primary security layer. Our research revealed that AI judgment is a supplement to, not a replacement for, rigorous system controls.
In many environments, the last line of defense is the AI’s ‘ethics.’ When that fails—as it inevitably does under pressure—there are often no technical backstops. If an AI tool is ‘open-ended’ (e.g., capable of running arbitrary system commands), once the LLM is subverted, an attacker gains the full permissions of the MCP tools that connect and extend the agent’s reach across the enterprise.
The most encouraging part of our research is that we’ve seen what “good” looks like. In the same environments where open-ended tools were easily exploited, we identified “Gold Standard” patterns that remained resilient:
As new language models are released at a breakneck speed, our security strategies must become model-agnostic. We cannot assume that “Version 4.7” will be safer than “Version 4.5.” Instead, we must harden the environment the AI inhabits.
The speed of AI advancement is breathtaking, but my 20 years in this field have taught me that fundamentals always win: Trust, but verify; and then, don’t trust at all. The goal for modern organizations shouldn’t just be to build a smarter AI, but to build a more resilient platform that remains secure even when the AI makes a mistake. In the world of Agentic AI, the most dangerous vulnerability isn’t a line of code—it’s the assumption that the machine can secure itself.