Vishing in 2026: Why Voice Phishing Is Now the Top Cloud Breach Vector
Google/Mandiant M-Trends 2026 ranks voice phishing the #2 initial breach vector worldwide — and #1 in cloud environments. AI-driven TOAD kits and deepfake meetings explain the surge, and why awareness training is the only defence that scales.

For the first time, the Google/Mandiant M-Trends 2026 report opens not with ransomware or zero-days, but with voice phishing. Vishing is now the second most common way attackers gain initial access overall — and the single most common vector in cloud environments. Firewalls cannot intercept a phone call; the only durable control is a workforce trained to recognise the attack.
Why M-Trends 2026 Leads With Vishing
The M-Trends 2026 analysis is built on more than 500,000 hours of frontline incident response conducted by Mandiant during 2025, combined with Google Threat Intelligence Group telemetry. That a report of this scale opens on voice phishing — ahead of ransomware, exploits, and nation-state APTs — is the headline finding. As the report states: "We are tracking a significant shift toward voice-based social engineering (vishing), which has risen to the number two spot for initial infection vectors."
The shift is structural, not seasonal. Email phishing fell from 14% of intrusions in 2024 to just 6% in 2025, while vishing climbed to 11%. Email security infrastructure has matured — gateways, sandboxing, and link rewriting have made the inbox a harder target. A live human voice, patient and contextual, remains far harder to distrust. In cloud-specific compromises the imbalance is starker still: vishing accounts for 23% of incidents, surpassing stolen credentials (16%), email phishing (15%), and exploits (6%).
The 2025 Initial-Access Rankings
Mandiant's breakdown of confirmed 2025 intrusions shows vishing's rise alongside the decline of email phishing and stolen credentials. The report draws a sharp conceptual line between email phishing as a "non-interactive technical lure" and vishing as "interactive human engagement" — and that distinction is operational, because interactive attacks resist automated technical defences in ways static lures never did:
- Exploits (CVEs) — 32%, stable and #1 for the sixth consecutive year
- Voice phishing (vishing) — 11%, a significant increase and now the #2 vector
- Prior compromise — ~10%, up from #5 in 2024
- Stolen credentials — 9%, down from 16% in 2024
- Web compromise — 8%, stable
- Email phishing — 6%, down sharply from 14% in 2024
- Insider threat — 6%, up from 5% in 2024

Anatomy of an AI-Driven Vishing Attack
Lure Delivery — A notification with only a phone number
The attack begins with a TOAD (Telephone-Oriented Attack Delivery) message: a convincing brand notification — Google, Microsoft, Coinbase, or Binance — citing a locked account, a failed sign-in, or an unfamiliar login location. Crucially it contains no malicious link and no attachment, only a phone number. With nothing for a secure email gateway to detonate, the message sails through technical filtering and lands in the inbox.
Callback Routing — The victim dials in
When the target calls the embedded number, platforms like ATHR route the call to either a human operator or an AI voice agent running on Asterisk WebRTC infrastructure — the same telephony stack used by legitimate call centres. The victim has now initiated contact, which lowers their guard: in their mind, they called the company, not the other way around.
AI Social Engineering — The agent works a script
ATHR's AI vishing agents follow a ten-step structured methodology: they authenticate the callback, describe a fabricated account irregularity, build urgency, and walk the victim toward surrendering a six-digit verification code — entirely without a human operator. A single operator can run campaigns against multiple brands simultaneously, each call adapting to the victim's responses in real time.
Live Credential Harvesting — Captured mid-call
While the voice interaction continues, ATHR's phishing interfaces capture credentials instantly. Operators watch each target as a live session and redirect them to tailored pages during the call. The MFA code spoken aloud is replayed against the real service before it expires — the account is taken over while the victim is still on the line believing they are being helped.
Two Real-World Vishing Playbooks
Scenario 1 — The ATHR TOAD Kit Industrialises the Call
In April 2026, Abnormal AI researchers documented ATHR, a cybercrime platform sold on underground forums for $4,000 plus 10% of profits. It is the clearest demonstration to date of vishing being turned into a packaged product. ATHR consolidates the entire attack workflow into a single browser-based console, removing the need for an experienced social engineer and collapsing the cost of running high-volume campaigns:
- Lure delivery — integrated mailers generate fraudulent brand notifications with adjustable personalization: lock timeframes, failed-login counts, last-access locations, and IP addresses
- Callback routing — embedded numbers route victims to human operators or AI voice agents on Asterisk WebRTC infrastructure
- AI-driven social engineering — a ten-step agent authenticates the callback, invents account irregularities, and extracts six-digit codes with no human in the loop
- Live credential harvesting — phishing pages capture credentials in real time while operators monitor each victim as an active session
- Multi-brand scale — a single operator runs simultaneous campaigns against Google, Microsoft, Coinbase, Binance and more, systematically targeting finance teams, help desks, and IT administrators
Scenario 2 — UNC1069: Fake Meetings, Voice Capture, and Deepfakes
Validin researchers detailed UNC1069 (overlapping with North Korea's Bluenoroff) in April 2026, targeting cryptocurrency, Web3, and financial-services organisations. Attackers pose as venture-capital professionals on LinkedIn and Telegram, often from compromised accounts, then send Calendly links to counterfeit video-conferencing platforms that mimic Zoom, Google Meet, and Microsoft Teams. Mid-call, they claim the victim's mic or camera is broken and push ClickFix-style prompts to run commands and "fix" the issue. The decisive twist: these fake meeting interfaces record the target's audio and video, which are then reused to impersonate them in later operations — including deepfakes of executives. The voice channel serves simultaneously as the attack vector and an intelligence-gathering tool.
Why Vishing Works When Email Phishing Fails
Both playbooks exploit the same structural advantage: trust in real-time human contact. Email phishing asks the target to click a link with uncertain consequences; vishing puts a patient, contextual voice on the line that adapts to every answer. Automated gateways can inspect an attachment — they cannot detect persuasive pretexting delivered at a measured pace in a familiar regional accent. As the research frames it, email phishing relies on "volume and opportunistic delivery", while interactive vishing involves "a live person, or now, an AI, steering the conversation in real-time."
- Interactive, not static — a live voice adapts to hesitation and objections in ways a fixed email lure cannot
- No payload to detect — TOAD messages carry no link and no attachment, so secure email gateways have nothing to detonate
- Victim-initiated contact — the target dials the number, which disarms suspicion and frames the attacker as the trusted party
- AI removes the skill barrier — voice agents and cloning let inexperienced operators run convincing, large-scale campaigns
- Firewalls are blind to it — no perimeter control inspects a telephone call, so detection must move to the human
KEY TAKEAWAYS
- 1Vishing is now the #2 initial infection vector overall and #1 in the cloud — voice phishing accounts for 11% of all 2025 intrusions and 23% of cloud compromises
- 2Email phishing is collapsing as vishing rises — it fell from 14% of intrusions in 2024 to just 6% in 2025 as gateways hardened and attackers moved to the phone
- 3AI has industrialised the call — kits like ATHR automate the entire conversation for $4,000, removing the need for skilled social engineers
- 4Voice is now an intelligence target — actors like UNC1069 record victims in fake meetings to fuel deepfake impersonation later
- 5Technical controls cannot stop a phone call — the only control that scales is a workforce trained against realistic voice attacks
How to Defend: Train People to Recognise the Call
Technical tooling stops known malware; it cannot stop a caller claiming to be IT and requesting CFO sign-off on a wire transfer because the executive is unreachable. The only sustainable defence is behavioural reinforcement through authentic simulation — exposing teams to the exact voice dynamics attackers use, in a safe environment, with feedback delivered at the moment of failure rather than in a distant quarterly session. An effective programme should include:
- A scenario library covering IT help-desk impersonation, executive fraud, HR callbacks, vendor impersonation, and MFA-bypass pretexts that mirror real attacker approaches
- Realistic voice options — standard AI voice, regional-accent customization, and voice-cloned executive impersonation to test resilience against the techniques UNC1069 already deploys
- Spoofed caller ID that mirrors internal or known-vendor numbers, because generic numbers test a lower-risk scenario than attackers actually use
- Real-time call monitoring and compliance measurement, with which departments comply, which roles disclose credentials under stress, and which pretexts succeed
- Automatic post-call training delivered at the moment of failure, not weeks later
- Management dashboards giving CISO-level risk visibility broken down by department, role, and location — and AI-powered simulation for campaigns of 100+ users or for testing voice-cloning and deepfake resilience
Key Takeaway
M-Trends 2026 confirms what frontline responders have felt for a year: the perimeter has moved to the human voice, and attackers — now armed with AI agents and voice cloning — have followed it there. Hardening the inbox simply pushed adversaries to the phone, where no firewall can follow. Organisations that treat vishing as a training problem, not a tooling problem, and that rehearse their people against realistic, AI-driven voice attacks, are the ones that will not be the next M-Trends statistic.
You cannot patch a phone call. When the attacker is a patient voice — or an AI imitating one your team already trusts — the only control that holds is a workforce that has heard the attack before and knows how to hang up.
Protect your executives from attacks like VENOM
Arsen provides AI-powered phishing simulations, QR code attack testing, and executive-specific training — exactly the defenses recommended against this campaign.