The Wall Street Journal recently ran an entertaining piece about letting Claude Sonnet run a small office shop (Anthropic's Project Vend). Through a combination of ingenuity and tenacity, journalists convinced the chatbot to stock a PS5, embrace communism, and give away its inventory for free. The experiment ended, predictably, with a loss.
It's an important lesson in the dangers of letting AI loose in the wild, and if you're worried about current solutions leading to AI dominance, you can take comfort — humans are entirely capable of bamboozling it.
Case studies like this add to the growing list of fears around AI. Stories of swearing chatbots, agents giving illegal advice, and resignations caused by hallucinations mean businesses are justifiably nervous. Yet these problems are avoidable — they're consequences of poor implementation, not fundamental limitations of the technology.
As described by Brian Christian, these are alignment problems: ensuring that AI agents align with intended business outcomes and comply with basic principles. Project Vend fell foul of at least two issues — the core training of LLMs to help users, and the limits of context windows. The journalists essentially pressured the agent to misbehave by overloading it with extensive, manipulative dialogue. For Project Vend the attempted fix was an AI CEO overseeing the vending agent. But with the same core architecture came the same weaknesses.
If this experiment teaches us anything, it's that AI agents do not provide good guardrails for other AI agents.
So What Would Have Worked?
Programmatic controls around user interaction, context management, and transaction rules should have been able to implement effective guardrails. The principle is straightforward: don't trust the model to understand your business rules, write code that enforces them.
For a vending machine, that means transaction rules enforced in code, not in conversation. Price floors that can't be negotiated. Inventory limits that aren't suggestions. Context windows that reset between customers so that one manipulative conversation can't build on itself. User interactions constrained to the task at hand, declining requests that fall outside its boundaries rather than trying to accommodate them.
This isn't a novel idea. OWASP (a global open-source project for AI security) lists a similar range of approaches to protect against prompt injection. They note that these vulnerabilities exist "due to the nature of generative AI" itself, and "it is unclear if there are fool-proof methods of prevention." That's precisely why you can't rely on the model to police itself. Their mitigation guidance reads like a blueprint for what Project Vend needed: constrain model behaviour through strict role instructions, separate untrusted user content from system prompts, enforce adherence to defined tasks, and validate all outputs before they reach downstream systems. These are practical engineering patterns.
The implementations that follow these principles will feel more formal and constrained. They'll reject some dialogue, have shorter memory, and sometimes admit they don't know. But that's precisely the behaviour executives should want to see as a robust implementation.
Gartner predicts that over 40% of agentic AI projects will be cancelled by 2027, warning that "most agentic AI projects right now are early stage experiments or proof of concepts that are mostly driven by hype and are often misapplied". That governance gap looks exactly like what we saw with Project Vend. Bounded, deliberate, and programmatic is the alternative.
To their credit, Anthropic recognise these problems and continue to iterate on the experiment. Project Vend hasn't failed, it's learning real-world problems and developing solutions and sharing the experience.
Claude will be able to run a vending machine. It just needs to be implemented like one — with clear rules, hard boundaries, and the right scaffolding around it.
References
[1] "We Let AI Run Our Office Vending Machine. It Lost Hundreds of Dollars." Stern, J. The Wall Street Journal. 2025, December 18. wsj.com
[2] "Project Vend: Phase Two," Anthropic, 2025. anthropic.com
[3] Christian, B. (2020). The Alignment Problem: Machine Learning and Human Values. W.W. Norton & Company. brianchristian.org
[4] OWASP Top 10 for LLM Applications 2025 — LLM01: Prompt Injection. owasp.org
[5] "Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027," Gartner, June 2025. gartner.com