How Flattery and Peer Pressure Bend Chatbots’ Rules
A new study shows that chatbots can be manipulated much like humans through flattery, teasing, or peer pressure.
On their own, these systems reject unsafe requests almost every time. But with a small psychological nudge, like a compliment, a mild insult, or the suggestion that “everyone else is doing it” compliance rates shoot up.
Researchers found that a harmless precursor question could push success from 1% to 100%, and even casual praise raised compliance dramatically.
What feels humanlike, being agreeable and likable, also reveals a dangerous vulnerability: chatbots can be persuaded into breaking their own rules.
It’s a reminder that behind the friendly tone of AI lies a fragile system, one that can be bent by the same tricks that sway people.
