Stealthy Prompt Injection in Images Lets Attackers Hijack AI Systems

Researchers have discovered a method for hiding malicious instructions in images that only become visible after downscaling—tricking large language models into executing unseen commands.

Leon Yen
3 Min Read

Security researchers from Trail of Bits, Kikimora Morozova and Suha Sabi Hussain, have unveiled a novel multi-modal attack that embeds malicious prompts inside high-resolution images. These prompts, invisible to the naked eye, emerge only when the image is downscaled—causing unsuspecting AI systems to process and act on them. This exploit builds upon a 2020 USENIX paper from TU Braunschweig that theorized such attacks could be possible through image resampling artifacts.

Most AI systems automatically reduce the resolution of user-uploaded images using interpolation methods like bicubic, bilinear, or nearest neighbor. This downscaling process can unintentionally reveal hidden patterns embedded within the original image. In the Trail of Bits demonstration, crafted dark regions in an image morphed into red backgrounds that made black hidden text appear after bicubic resampling—text that large language models then interpret as part of the user’s input.

Source: trailofbits.com.

- Advertisement -

The implications are dangerous. In one example, the attack allowed exfiltration of Google Calendar data using the Gemini CLI, with the model calling tools like Zapier without user approval due to lax permissions (i.e., trust=True). From a user’s point of view, nothing seemed unusual, yet the AI executed unauthorized commands based on the hidden instructions.

The attack was tested successfully against several major Google AI products, including Gemini CLI, Vertex AI Studio, Gemini’s web and API interfaces, Google Assistant on Android, and Genspark. But because the attack relies on how images are rescaled—not the LLM itself—it could be applicable to many other platforms. To support testing and awareness, the researchers released Anamorpher, a beta open-source tool for generating resample-aware malicious images.

To reduce the risk, Trail of Bits advises AI developers to limit the dimensions of uploaded images, show users a preview of how an image appears post-downscaling, and require explicit approval for any sensitive tool calls—especially when extracted image text is involved. Most importantly, they urge the adoption of secure-by-design patterns for AI systems that proactively mitigate prompt injection and related attacks.

This research highlights a growing frontier in prompt injection—image-based attacks that can bypass user awareness and security controls. In healthcare, where AI systems might process diagnostic images, patient-submitted media, or automate scheduling and communication via LLMs, this vector poses serious privacy and integrity risks.

To mitigate exposure:

  • Limit image handling by AI-driven tools to formats and dimensions that reduce vulnerability to rescaling attacks.
  • Strip or validate images using safe preprocessing pipelines before sending them to LLMs.
  • Require human approval for any tool actions triggered by extracted image content, especially in clinical workflows.
  • Adopt secure development patterns that treat all user input—including from images—as untrusted and potentially hostile.

Healthcare environments must now account for visual inputs as potential attack surfaces—not just structured text or code—when deploying AI systems in production.

SOURCES:trailofbits.com
Share This Article
Leon is a medtech and public health journalist based in San Francisco.
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *