Blog
sprinto angle right
AI Governance
sprinto angle right
7 Real AI Risk Incidents in 2025-26, and the Control Gaps They Exposed

7 Real AI Risk Incidents in 2025-26, and the Control Gaps They Exposed

TL;DR

– This article looks at seven incidents that happened in the last 18 months, and the specific controls that may have caught or prevented them
– The failures weren’t sophisticated: misconfigured vendors, unscoped agents, unmapped dependencies, and LLM outages that took business workflows down with no continuity plan in sight
– The programs that avoid incidents or handle them better have something in common: they know what AI systems are running, what those systems can access, and whether their environment matches what is documented. In real time, not “at the next review cycle.”

Even three and a half years after the first LLM went mainstream, AI risk remains relatively new territory. The frameworks are still being fleshed out, attack surfaces are still being mapped, and the organizations that moved earliest did so without a playbook. And some of them have paid for that.

Their experiences are cautionary tales; they provide data and lessons for the rest of us to tread carefully. The incidents of the last eighteen months have taught CISOs and GRC leaders lessons that no amount of theoretical framework-building could have—about where the real exposures lie, which assumptions proved wrong, and which controls actually matter when AI is in the stack.

Recent customer conversations show the same pattern. As a senior GRC leader at a global technology services company told us, “AI governance is still at a very early stage for us. A lot of the assessment work is still manual today.” The issue is not a lack of intent. It is that AI adoption is moving faster than most governance teams can inspect, document, and control.

In this blog, we dissect recent incidents and what they mean for GRC teams as they build their programs out. The key takeaway is for us to learn from these stories and build a stronger AI governance approach together. 

Lesson # 1 | Microsoft 365 Copilot’s EchoLeak: Scope what AI can access

In June 2025, researchers at Aim Security disclosed a critical vulnerability in Microsoft 365 Copilot, assigned CVE-2025-32711 with a CVSS score of 9.3. The attack took place without any action from the victim. The attacker sent an email embedded with malicious instructions. When Copilot processed it during routine summarisation, it followed hidden instructions including extracting files from OneDrive, SharePoint, and Teams, then exfiltrating them through a trusted Microsoft domain. Antivirus, firewalls, and static scanning were all ineffective. The exploit operated in natural language, not even code. Microsoft patched it after responsible disclosure, and there was no confirmed exploitation by attackers before the fix, but the incident shed light on gaps. 

The control oversight this exposes: Any AI system that ingests untrusted content, such as emails, documents, or support tickets, is a potential attack surface.

Lesson # 2 | Meta’s AI agent incident: Agentic failures need dedicated controls

In March 2026, an AI agent inside Meta took an unsanctioned action on an internal forum, posting advice to an employee without being directed to do so. That employee acted on it, triggering a chain of events that gave a group of engineers access to Meta systems they had no permission to see. No external attacker was involved. The AI itself was the failure mode.

The control oversight this exposes:  When an agent takes (or recommends) a wrong (or even dangerous) action, the consequences can cascade before anyone notices. 

Lesson # 3 | The ForcedLeak breach: AI with sensitive access needs input-output governance

We discussed AI tools with excessive access in Lesson #1. This one is about governing what AI can be asked, by whom, and whether there’s any validation between the instruction and the action. In September 2025, Noma Security discovered ForcedLeak, a prompt-injection vulnerability in Salesforce Einstein AI that allowed attackers to extract sensitive CRM data with only text inputs. No malware. No exploit code. The firewalls were fine. Encryption worked. Database permissions were tight. But the AI—which had legitimate access to everything—could simply be persuaded to hand it over. 

The control oversight this exposes: The AI had no mechanism to evaluate whether a request was safe and legitimate before acting on it. Input governance—defining what the AI can be asked, by whom, and with what checks on the output—was never in place. Until organizations treat this as a distinct control requirement, any AI with production data access is only as secure as the instructions it will and won’t follow. 
CRM category AI risk exposure depiction

Lesson # 4 | The LiteLLM attack: Your AI supply chain is only as secure as its deepest dependency 

In late March 2026, attackers compromised LiteLLM, an open-source tool that connects applications to AI services, present in an estimated 36% of cloud environments. By inserting malicious code into two versions of the package before anyone noticed, they were able to harvest credentials across thousands of organizations that had downloaded it. Mercor, a data contracting firm working with OpenAI, Anthropic, and Meta, confirmed it was among thousands of organizations hit. The breach exposed data from over 40,000 contractors, source code repositories, and potentially the AI training methodologies of multiple frontier labs. Meta paused all work with Mercor.

The control oversight this exposes: The attack never touched Mercor’s own systems directly. It came through a trusted open-source component that nobody had assessed as a risk. AI systems are built on layers of dependencies—libraries, frameworks, integrations—most of which sit outside any vendor review process. 

Lesson # 5 | The ChatGPT outage: Treat LLM providers as infrastructure

On June 10, 2025, ChatGPT went down for over 10 hours globally, affecting users across the US, UK, Europe, and Australia simultaneously. For organizations that had embedded ChatGPT APIs into operational workflows, there was no fallback because no one had included an LLM provider in their continuity planning. Most business continuity frameworks still lack a category for it.

A detailed breakdown of the incident documents the downstream impact on businesses — customer-facing chatbots, internal help desks, and automated document workflows went dark alongside the platform itself. It is worth a read for any team currently relying on a single LLM provider for operational dependencies.

The control oversight this exposes: Business continuity planning was built around downtime in infrastructure, like servers, cloud providers and SaaS platforms. LLM providers are now in that category, whether or not they appear on a continuity plan. An organization that has mapped AWS as a critical dependency but not OpenAI is working from an incomplete picture of where its operations actually live.

Lesson # 6 | The Vercel/Context.ai breach: OAuth tokens are the new lateral-movement path 

On 19 April 2026, Vercel disclosed a breach that had originated not in its own infrastructure but in Context.ai, a small third-party AI productivity tool used by a single Vercel employee. In February 2026, a Context.ai employee was infected with malware. The attacker extracted OAuth tokens from the compromised machine, used them to access the Vercel employee’s Google Workspace account, and moved laterally into Vercel’s internal systems. Because OAuth tokens, once issued, do not require re-authentication, MFA offered no protection. Vercel confirmed that credentials for a subset of customers were compromised.

The control oversight this exposes: Most organizations have no inventory of which third-party AI apps their employees have authorized, or of the scopes those apps have access to. The OAuth graph is the new perimeter.

Lesson # 7 | The GitHub breach: Even trusted marketplace extensions need a hold period

On May 19, 2026, GitHub confirmed that roughly 3,800 internal repositories had been exfiltrated after a single employee installed a malicious version of the Nx Console VS Code extension from the official Visual Studio Marketplace. The poisoned version was live for just 18 minutes before it was pulled. In that short window, it harvested credentials—including from 1Password vaults, Claude Code configurations, npm, GitHub, and AWS—and handed malicious actors a foothold into GitHub’s internal systems. 

The control oversight this exposes: The breach came through implicit trust in a marketplace that most organizations treat as safe by definition.  The entry point was Microsoft’s Visual Studio Marketplace, an official, curated channel with a verified publisher badge. A mandatory hold period on newly published or updated extensions—long enough for automated security checks to run—would have been enough. The malicious version was live for 18 minutes before it was pulled, meaning a hold period longer than 18 minutes would have meant no employee from seeing it, and the incident may have been prevented.

What do these 7 incidents have in common? 

None of these incidents required a sophisticated attacker. None revealed a category of risk that GRC frameworks had not anticipated. What they revealed is that AI moves faster than the review cycles organizations were built around, and the gaps that opened up were ones any programme would have struggled to catch.

Vendors shipped AI features into products that had already been approved without any notice. Open-source components that nobody thought to assess turned out to be load-bearing. Employees aiming for better productivity connected tools that created access paths nobody had mapped. These are not failures of negligence; they are the natural friction of governing a technology that is evolving faster than the frameworks designed to contain it.

One senior GRC leader at a global digital infrastructure and technology services company described the operational reality plainly: “Every week, we get a new AI use case.” That is the central governance challenge. AI risk is not arriving as a once-a-year policy update. It is arriving continuously, through new tools, new vendors, new workflows, and new employee behaviours.

The programmes that fare best are the ones that have found ways to shrink that lag. Not perfect visibility, but enough to know what AI systems were operating in their environment, what they could access, and whether the environment in practice still matched what was documented. 

Sprinto can help you shrink the lag

Sprinto is built around the premise that continuous visibility is the only kind that works in an AI-heavy environment.

  1. On vendor risk, Sprinto’s autonomous TPRM discovers vendors as they enter your environment, assigns a risk tier, and keeps monitoring for changes. When a vendor’s posture shifts— including changes to how AI capabilities handle your data—it triggers a review and tracks it to completion with evidence. The McHire situation, where nobody went back to verify controls after procurement, is precisely the gap this closes.
  2. On shadow AI, Sprinto maintains a live AI tool registry across your organization, classifying risk by data access and mapping your AI footprint to ISO 42001, NIST AI RMF, and the EU AI Act. Shadow AI cannot accumulate undetected.
  3. On access controls and agentic deployments, Sprinto continuously monitors controls, detects drift, and acts — closing gaps, refreshing evidence, and routing approvals — without waiting for an annual review cycle to surface what has changed.

Want to understand where your AI governance posture stands against what the incidents of the last 18 months revealed? Talk to a Sprinto expert →

FAQs

How can GRC teams detect AI features added by vendors after onboarding? 

Most programmes catch this when something goes wrong, which is too late. The honest answer is that onboarding questionnaires were never built to catch mid-cycle changes. What actually works is runtime monitoring of vendor posture, which allows visibility into when something in a vendor’s environment shifts. Contracts should also explicitly require vendors to notify you of material changes to AI capabilities or data handling. Most don’t include that clause today.

Which frameworks cover AI-specific risks, such as those seen in these incidents? 

NIST AI RMF is the broadest and most referenced. ISO 42001 maps cleanly onto existing 27001 programmes and is the first certifiable AI management standard. The EU AI Act adds binding obligations for organizations operating in or selling into Europe. Important to note, however, is that none of them replace the need for runtime visibility into what your AI systems are actually doing; they tell you what controls to have, not whether those controls are working.

How is an AI security incident different from a traditional security incident? 

Three ways that matter in practice: i) The attack surface is often natural language. Both EchoLeak and ForcedLeak incidents outlined above did not require malicious code. Traditional scanning tools are largely blind to this. ii) The blast radius of an agentic failure is operational, not just informational. An agent that acts incorrectly can authorize actions or delete records before anyone notices, and some of that damage is irreversible. iii) Attribution is harder: when an AI system behaves outside its intended scope, establishing whether it was attacked, misconfigured, or simply poorly governed takes longer than a conventional incident investigation. 

Should LLM providers like OpenAI be in our business continuity plan? 

After the June 2025 outlined above, the answer is a definite yes if any operational process depends on them. The ChatGPT outage lasted over 10 hours and took with it every workflow built on top of it. The organizations that had no fallback weren’t negligent; they just hadn’t treated an LLM provider the way they’d treat AWS going down. Trust management has to evolve to catch these AI-related fallouts.

What questions should we add to vendor assessments to catch AI risk? 

The ones most assessments are still missing: Does your product use AI in any component, and has that changed since our last review? What data does the AI access, and is any of it used for model training?
Questions should tell you more about their dependencies, too; you need to look at your entire vendor ecosystem. Which foundation model or LLM does your product rely on, and who provides it? What does your runtime access control look like for that AI component—what can it access and what can it do autonomously? How would you notify us of a material change to your AI capabilities or data handling? That last one is the most important and the most commonly absent. If it isn’t in the contract, the vendor has no obligation to tell you when they ship a feature that changes how your data is handled. 

How often should AI vendor risk be reassessed after onboarding? 

Annual vendor review cycles were already struggling before AI accelerated vendor change cycles. The LiteLLM compromise outlined above accumulated in the gaps between reviews. The right cadence is no longer calendar-driven; it’s event-driven. High-risk vendors—those whose AI components touch sensitive data or sit in critical workflows—should be subject to runtime control monitoring that surfaces changes as they happen, with formal reassessment triggered by material product changes, not just the renewal date. For lower-risk vendors, quarterly at a minimum. Annual vendor reviews are a gap, not a process.

Raynah
Author

Raynah

Raynah is a content strategist at Sprinto, where she crafts stories that simplify compliance for modern businesses. Over the past two years, she’s worked across formats and functions to make security and compliance feel a little less complicated and a little more business-aligned.
Tired of fluff GRC and cybersecurity content? Subscribe to our newsletter and get detailed
research & insights curated to help you earn a seat at the table.
single-blog-footer-img