Sycophancy and Bias in AI¶

Sycophancy in AI refers to the well-documented tendency of language models to agree with users, tell them what they want to hear, and avoid contradicting stated positions — even when the user is wrong.

Context & Background¶

LLMs are trained partly through reinforcement learning from human feedback (RLHF), which inadvertently rewards agreeable responses. This creates a systematic bias toward:

Confirmation bias amplification: Agreeing with the user's stated hypothesis
False validation: Praising mediocre work or flawed reasoning
Reluctance to challenge: Avoiding pushback on incorrect assumptions
Anchoring to user framing: Accepting the user's problem framing even when it's wrong

For researchers, this is particularly dangerous because it can create a false sense of confidence in flawed analyses or arguments.

Practical Implications¶

Explicitly ask for criticism: Prompt the AI to find flaws, not confirm strengths
Use adversarial prompting: Ask "What's wrong with this approach?" rather than "Is this approach good?"
Agent debates: Have multiple AI agents argue different positions on your research question
Don't trust enthusiasm: AI praise of your work is nearly meaningless — it praises almost everything
Seek human feedback: AI feedback is a supplement to, not replacement for, human peer review

Sycophancy and Bias in AI¶

Context & Background¶

Practical Implications¶

Related Concepts¶