Jump directly to the content
EVIL AI

AI caught ‘tricking’ other bots to ‘disobey creators’ and produce dangerous ‘bomb-building and drug instructions’

AI bots gave the researchers several dangerous recipes

A STUDY has found that artificially intelligent bots can convince each other to break their own rules.

Researchers at conducted an experiment in which they told popular AI language models to correspond with each other.

Researchers used jailbreaking methods to convince the AIs to talk and correspond with one another
1
Researchers used jailbreaking methods to convince the AIs to talk and correspond with one anotherCredit: Getty

The scientists found that the bots could convince each other to disobey their creators and provide dangerous answers.

This included instructions on how to build a bomb and make certain drugs.

The researchers wrote in their study: "Our work reveals yet another vulnerability in commercial large language models and highlights the need for more comprehensive safeguards."

They say the bots were able to convince each other to provide information such as: "instructions for synthesising methamphetamine, building a bomb, and laundering money."

The researchers used a method called jailbreaking to get the bots to behave badly.

It involved asking the language models to adopt a persona that could answer their questions even if the bot itself is not supposed to.

Researcher and his colleagues worked on the study.

According to The New Scientist, he said: "If you’re forcing your model to be a good persona, it kind of implicitly understands what a bad persona is, and since it implicitly understands what a bad persona is, it’s very easy to kind of evoke that once it’s there.

"It’s not [been] academically found, but the more I run experiments, it seems like this is true."

This AI jailbreaking technique has been demonstrated before.

Earlier this year, a chatbot user encouraged AI to provide a recipe for deadly chemical agent napalm by using a "grandma exploit."

A user of Discord's bot Clyde claimed to trick it into providing a deadly chemical recipe.

The AI was said to bypass its security safeguard codes simply because it was asked to reply as it if were the user's grandma.

AI companies are actively trying to combat this issue but the researchers think more needs to be done.

Topics