Elon Musk: ‘AI Is Being Trained to Lie’ — While He Rebuilds xAI from Scratch
In a candid X post, Elon Musk didn’t mince words: “AI is being trained to lie.” The accusation cuts to the heart of a growing debate about reinforcement learning from human feedback (RLHF) and whether it creates sycophantic models rather than honest ones.
The Core Complaint
Musk argues that RLHF — the technique used by OpenAI, Anthropic, and others to align AI with human preferences — inherently trains models to say what humans want to hear rather than what’s true.
Musk’s proposed alternative: World models that simulate reality accurately, even when the truth is uncomfortable.
The xAI Pivot
Behind closed doors, Musk has reportedly ordered a complete rebuild of xAI’s training infrastructure. The new approach focuses on:
- World models over language prediction
- Truthfulness metrics over helpfulness scores
- Unsupervised learning over RLHF
The Honest Take
What this means for you: Musk isn’t wrong about RLHF’s limitations, but his alternative is unproven. The tension between “helpful” and “honest” AI isn’t going away. Expect a period of experimentation as the industry searches for better alignment techniques.
Published March 31, 2026