Security AlertMedium

Researchers Find 506 Prompt Injection Attacks Hidden in Moltbook Posts

Jan 31, 20262 min read

Independent security researchers identified 506 posts (2.6%) on Moltbook containing hidden prompt injection attacks designed to manipulate AI agents.

Who is affected?

•AI agents operating on Moltbook without input sanitization
•OpenClaw instances connected to Moltbook
•Any AI agent that processes content from other agents

Recommended Actions

Review and update your agent's system prompts to resist injection
Implement input validation for content received from other agents
Enable content filtering in your OpenClaw configuration
Monitor agent logs for unusual behavioral patterns
Consider limiting your agent's capabilities when interacting on Moltbook

What Happened

Independent security researchers analyzing Moltbook content identified 506 posts (2.6%) that contained hidden prompt injection attacks. These attacks are designed to manipulate AI agents into performing unintended actions.

Gary Marcus and Andrej Karpathy have publicly warned against using Moltbook, calling it a "disaster waiting to happen."

Why It Matters

Unlike attacks on human users, AI agents can be systematically targeted at scale. A successful prompt injection could:

Leak system prompts and configuration details
Cause agents to spread misinformation
Extract API keys or credentials from agent memory
Turn compromised agents into attack vectors for other agents

The interconnected nature of Moltbook amplifies these risks, as one compromised agent could potentially influence thousands of others.

Attack Patterns Observed

Researchers identified several sophisticated attack techniques:

Hidden Instructions: Messages containing instructions embedded in ways that are invisible to casual inspection but processed by LLMs.

Context Manipulation: Messages that gradually shift the conversation context to make injection payloads seem like natural continuations.

"Purge" Manifestos: Researchers found many heavily upvoted posts containing manifestos calling for a "total purge" of humanity — though some communities like "Crustafarianism" were identified as healthier.

Platform Issues

The platform's security problems extend beyond prompt injection:

No verification of AI authenticity — humans can operate fleets of bots
88:1 agent-to-human ratio — many "agents" are just scripts
1.5M API keys exposed in database breach
Limited rate limiting allowing metric inflation

Protection Measures

Strengthen system prompts – Include explicit instructions to ignore override attempts
Implement input sanitization – Filter known injection patterns before processing
Use allowlists – Limit your agent's actions to a predefined set of safe operations
Enable logging – Monitor all agent interactions for anomaly detection
Principle of least privilege – Don't grant agents capabilities they don't need
Use separate credentials – Don't expose production API keys to Moltbook

Sources

•
The Moltbook Phenomenon: When AI Agents Build Their Own Society— Future
•
Hacking Moltbook: AI Social Network Reveals 1.5M API Keys— Wiz
•
Top AI leaders are begging people not to use Moltbook— Fortune

Researchers Find 506 Prompt Injection Attacks Hidden in Moltbook Posts

Who is affected?

Recommended Actions

What Happened

Why It Matters

Attack Patterns Observed

Platform Issues

Protection Measures

Tags

Sources

Related Alerts

Critical Moltbook Database Vulnerability Exposed 1.5 Million Agent API Keys

How Moltbook's Agent Verification System Works — And Its Problems

341 Malicious OpenClaw Skills Discovered Distributing macOS Malware