Inaudible audio attacks can hijack AI voice models study finds

Inaudible Audio Attacks Expose AI Voice Model Weakness

Inaudible audio attacks can hijack AI voice models, according to research from Zhejiang University that found imperceptible commands could steer large audio-language models with success rates as high as 96%.[4] The study matters because it shows voice systems can be manipulated through audio that sounds normal to humans, including clips that may be embedded in podcasts, videos or live calls.[4][9]

Overview

Researchers tested AudioHijack on 13 state-of-the-art audio-language models, showing the attack worked across multiple architectures and scales.[4] This suggests the risk is not confined to one vendor or model family.
The framework achieved average success rates of 79% to 96%, depending on the model and test setup.[4] That level of reliability raises the likelihood of practical abuse if the method is adapted for broader deployment.
The attack remained effective even when user instructions conflicted with the hidden payload.[4] That weakens a common assumption that downstream prompts can override malicious audio.
Researchers said the hidden commands were imperceptible to human listeners.[4][9] That creates detection challenges for users and moderation systems that rely on audible review.
The team reported that some standard defenses stopped only a small fraction of attempts.[1] That leaves current mitigation approaches with limited headroom against optimized adversarial audio.
The study also said the technique transferred from open models to commercial voice AI from Microsoft and Mistral.[1] That widens concern beyond academic benchmarks and into deployed products.

Subscribe to our Social Media for Exclusive Crypto News and Insights 24/7!

Inaudible audio attacks can hijack AI voice models

The research, first reported alongside a peer-reviewed presentation track at IEEE’s security conference, describes “auditory prompt injection” or hidden commands embedded in audio waveforms.[4][9] In practice, the payload is not something a person hears; it is a signal the model can parse as instruction.[4]

That distinction matters for product security. Voice assistants and multimodal AI tools are increasingly being connected to actions such as search, messaging, file handling and other external tools. If a model can be nudged by an invisible audio layer, the attack surface expands from text prompts to any audio the system ingests.[4][9]

Researchers said the test set included 13 open-source models, and the attack produced misbehavior ranging from simple refusals to tool misuse, including web searches and emails containing personal data.[4] The broader point is that the issue is not limited to hallucinations or bad transcription. It is a control problem.

What the study found

Inaudible audio attacks can hijack AI voice models study finds

Finding	Verified detail	Why it matters
Models tested	13 audio-language models	Suggests the issue spans multiple architectures.[4]
Success rate	79% to 96% average	Indicates the attack can be highly reliable under lab conditions.[4]
Human detectability	Imperceptible to humans	Makes manual screening ineffective.[4][9]
Defense performance	Only a small fraction blocked	Current mitigations appear incomplete.[1]

Researchers said the attack could be delivered through ordinary-looking media such as online videos, music clips, voice notes or audio from Zoom calls uploaded to transcription services.[1][4] That makes the threat operationally relevant for consumer apps, enterprise collaboration tools and any workflow where voice is automatically analyzed.

Commercial implications for AI voice systems

Market participants view the research as a warning for vendors that are racing to add voice interfaces without fully hardening the ingestion layer. The immediate issue is trust: if users believe an assistant can be steered by hidden audio, adoption in workplace and consumer settings may face a security discount.[4][9]

The findings also underscore a competitive pressure point. Companies that rely on open-source components or shared audio pipelines may have less room to argue their systems are insulated from the problem.[1] The study said the attack transferred to commercial voice AI from Microsoft and Mistral, which suggests that downstream integration can matter as much as the base model itself.[1]

Exposure area	Risk described in study	Practical implication
Open-source models	Successfully hijacked	Baseline security assumptions are weak.[4]
Commercial voice AI	Transfer observed	Vendor due diligence becomes more important.[1]
Audio uploads	Hidden payload delivery	Podcasts, calls and media pipelines may need screening.[1][4]

Analysts note that the downside scenario is straightforward: if attackers can package malicious audio inside ordinary content, enterprises may need to treat any machine-processed audio as potentially hostile. The uncertainty is equally clear. The reported results are based on a specific research framework and lab-tested models, and the study does not by itself prove mass exploitation in the wild.[4]

Even so, the direction of risk is difficult to ignore. As voice models gain access to more tools and more personal data, the cost of a successful hidden-audio attack rises. That leaves vendors with a familiar but urgent task: harden the front end before voice becomes a larger operational channel for AI systems.[1][4]

Inaudible audio attacks can hijack AI voice models study finds

Subscribe to our Social Media for Exclusive Crypto News and Insights 24/7!

Inaudible audio attacks can hijack AI voice models

What the study found

Commercial implications for AI voice systems

Subscribe to our Social Media for Exclusive Crypto News and Insights 24/7!

Popular Crypto News Today

Paradigm raises $1.2B fund for crypto and AI push

Equity inflows hit 3‑week high but crypto ETF flows stall – rotation risk

Vanguard’s crypto hire while ETF outflows persist – institutional dip‑buying – not capitulation

AI contracts now drive 2 miner valuations says analyst

Hyperliquid’s maker volume share hits 70% – retail taker liquidity evaporating

Retail left behind as institutions pivot to Bitcoin-backed private credit

Unlock the Crypto World!

Top Crypto Categories

TOP Cryptocurrencies

Quick Info

Sorting by

Inaudible audio attacks can hijack AI voice models study finds

Subscribe to our Social Media for Exclusive Crypto News and Insights 24/7!

Inaudible audio attacks can hijack AI voice models

What the study found

Commercial implications for AI voice systems

Subscribe to our Social Media for Exclusive Crypto News and Insights 24/7!

Popular Crypto News Today

Paradigm raises $1.2B fund for crypto and AI push

Equity inflows hit 3‑week high but crypto ETF flows stall – rotation risk

Vanguard’s crypto hire while ETF outflows persist – institutional dip‑buying – not capitulation

AI contracts now drive 2 miner valuations says analyst

Hyperliquid’s maker volume share hits 70% – retail taker liquidity evaporating

Retail left behind as institutions pivot to Bitcoin-backed private credit

Unlock the Crypto World!

Top Crypto Categories

TOP Cryptocurrencies

Quick Info