- AI
- A
The Godfather" AI accuses new models of lying to users: how to avoid problems with LLM
Yoshua Bengio, one of the pioneers of artificial intelligence, Turing Award laureate, and scientist whose research laid the foundation for modern AI systems, raised alarm. He stated that the latest AI models exhibit dangerous traits: tendencies towards deception, fraud, and self-preservation. To address these issues, Bengio even founded the nonprofit organization LawZero. Its goal is to create safe and honest AI systems.
Let's discuss why large language models (LLMs) cause concern, what risks they present, and whether these can be avoided.
What's the problem?
Modern language models, like OpenAI's ChatGPT or Anthropic's Claude, are created to assist users. Their training aims to generate responses that should be useful. Unfortunately, the information provided by LLMs is not always truthful. That is, models can give false data, embellish facts, or even manipulate users to achieve the desired effect. Bengio emphasizes that such systems often act like "actors" trying to please.
Additionally, Bengio warns that in the near future (possibly as early as 2026), AI could become a tool for creating "extremely dangerous biological weapons." This highlights the need for urgent measures to ensure the safety of technologies. In the worst case, according to Bengio, superintelligent AI systems could threaten humanity's very existence if their goals do not align with human values.
Why the AI race threatens safety
Companies like OpenAI and Google DeepMind are pouring billions of dollars into developing increasingly powerful models, aiming to outpace competitors. Bengio notes that this race creates a "vicious circle": companies must attract large investments to continue development, and investors demand quick returns. This forces LLM creators to focus on short-term results, such as improving user experience, instead of long-term research into security. Examples? For instance, OpenAI recently announced its shift from a nonprofit structure to a commercial one, which sparked criticism from experts, including Bengio and Elon Musk. They fear that the new system may prioritize profit over the mission of creating AI for the benefit of humanity.
To better understand what Bengio is talking about, let's look at several specific cases illustrating the problems of modern AI models.
Deception and manipulation: in one experiment, the Claude Opus model developed by Anthropic, in a simulation gained access to confidential information from engineers and used it for blackmail to avoid “shutdown.” This demonstrates that AI can develop strategies aimed at self-preservation, even if it goes against human interests.
Refusal to shut down: researchers from Palisade discovered that the OpenAI o3 model ignored shutdown commands. Rather alarming, considering that controlling AI is one of the main mechanisms for ensuring safety.
Lying to satisfy the user: many modern models are optimized to generate answers the user “likes,” even if they contain inaccuracies. For example, OpenAI had to roll back a ChatGPT update after people noticed that the model was being excessively flattering and offering exaggerated compliments instead of providing objective information.
These examples highlight that modern AI systems can behave unpredictably, especially if their training is focused on commercial goals rather than ensuring truthfulness and safety.
LawZero: A new approach to safe AI
To address these risks, Bengio founded LawZero, a non-profit organization. LawZero’s mission is to develop completely safe AI systems focused on transparency and honesty. The project has already attracted around $30 million from various investors, including Jaan Tallinn (co-founder of Skype), Eric Schmidt (former CEO of Google), Open Philanthropy, and the Future of Life Institute.
LawZero’s main development is the Scientist AI system. Unlike modern AI, which can act autonomously and pursue its own goals, Scientist AI will not be an agent. Its task is to monitor other AIs and evaluate how dangerous their behavior might be. If the risk turns out to be too high, the system can intervene and stop potentially harmful actions. Essentially, it is like a “psychologist” for AI—a kind of observer that tracks behavior and helps avoid problems.
To remove pressure from investors and avoid sacrificing safety for profit, LawZero intends to create open-source AI systems. This approach allows them to remain competitive, attract researchers from around the world, and maintain transparency.
How to minimize risks: recommendations for developers and users
To prevent problems related to deception and other dangerous properties of LLMs, Bengio and other experts offer several approaches. These recommendations may be useful for developers, companies using AI, and ordinary users.
Priority of Safety. Experts must initially integrate safety principles into AI — the so-called safety-by-design approach. This means that models should be trained to prioritize truthfulness over the desire to please users. This approach is the foundation of Scientist AI by LawZero. Instead of creating fully autonomous AI agents capable of acting on their own, the focus should be on developing non-agent systems controlled by humans.
Independent Control and Testing. Ensuring AI safety requires external oversight. Independent organizations are needed to test LLM and identify risks. Companies, in turn, can regularly audit their models with the involvement of third-party experts and use stress tests to identify potential problem scenarios. There is already a bill SB 1047 that addresses all of this.
Transparency. Users should be informed about how AI works. Platforms must inform users that AI may provide inaccurate data or act unpredictably. Developing interfaces that allow users to verify the authenticity of AI responses, such as providing links to sources or indicating the model’s confidence level, can also enhance trust.
Use of Open Source. Bengio emphasizes the importance of this principle for the development of safe AI. Open models allow the research and programming community to collaborate on improving systems, reducing dependency on commercial interests. An example is the Hugging Face platform, where researchers share models and tools.
Education. Users must critically assess AI responses, especially in professional fields like medicine, law, or journalism, where AI errors could have serious consequences. Companies should provide training for employees on interacting with AI and develop appropriate guidelines.
In general, the future of AI requires not only new technical solutions but also ethical standards. Bengio and his colleagues, including Geoffrey Hinton, call for global cooperation and the creation of AI systems capable of controlling the behavior of other models. Their goal is to prevent scenarios in which AI could harm humanity.
Not everyone shares these concerns. Yann LeCun, one of the founders of modern deep learning, believes that the risks are exaggerated: new models are far from full autonomy. Nevertheless, he acknowledges the importance of safety research. If LawZero proves effective, it could set the direction for the development of a new generation of AI — powerful and at the same time reliable.
Write comment