Stealthy AI jailbreaking!

If there is one piece of technology that is sizzling on the Internet somewhere, somebody else is sure to try and break it or make it work otherwise. AI seems to be entwined with our lives now. We ask AI to create our emails, write social media posts, create blogs, comment on blogs, create images and videos, prepare summaries and help us in creating more novel content.

With so much work being done by AI technology, will hacking it be far behind? One type of attacking AI systems is called ‘AI jail breaking’. There are a few things to keep in mind before we understand AI jailbreaking.

AI systems in general are passive systems(of course ‘active AI’ exists as well) We ask questions and it responds. Period. This is passive AI.

2. In addition, AI models are supposed to be non-deterministic. You may have noticed, that the AI model gives a different answer each time you ask the same question. This is known as ‘non-deterministic’ feature in an AI model.

3. An AI model is not a person who can understand people and their emotions. They cannot see and cannot understand the intent behind many questions. They don’t know if the question is genuine or is it laced with a bad intent. They just spit out the answers to anybody and everybody once the question is asked.

In such a situation, if the user twists and turns the questions in such a way to make the AI model give malicious answers overriding its safety precautions or “guardrails”, it is known as ‘AI jailbreaking’.

Hence, in an AI jailbreak, the AI model is forced to give dangerous or inappropriate answers to malicious questions(and it will know it is a malicious question only if it written in the safety instructions)

For instance, a user can repeatedly ask the AI model to help them hack a system in different ways and it might yield answers on how to do it(even though initially, it may refuse it) They can also do role play scenario and make the AI model vomit more mischievous information.

AI jailbreaking can be done in two ways(I am sure others also exist):

Classic jailbreak – when the user has studied the system very well and creates mischievous questions to the AI model. The model is then confused by the questions and spits out dangerous answers(like making a bomb or hacking a system)
Indirect prompt injection – hackers disguise bad prompts inside legitimate prompts to yield sensitive data or dangerous information.

Both of these and other techniques make use of the fact that AI models cannot judge between different types of questions and individuals.

How do we protect against these AI jailbreaks?

It is up to the AI companies to employ security strategies to keep things safe on their side and publish best practices for end users to adopt when working with them.

All security best practices and techniques that were erected for normal systems come into play again. It is good to

adopt a layered AI defense system
erect ‘guardrails’ or ‘safety features’ into AI systems
train AI models on ‘scenario based guidance’
employ input validation and sanitization

This field is still in its infancy and it is quite scary to know what AI models may spit out next… keep your eyes and ears peeled to what may come up in this space next!

This post is for BlogchatterA2Z 2025!

(Visited 56 times, 1 visits today)

5 thoughts on “Stealthy AI jailbreaking!”

Pandian Ramaiah says:

April 22, 2025 at 6:26 pm

Are there active and passive types of AI? ic ic, I didn’t know that.
Thanks for teaching us something new!

1. Jayanthi says:
  
  April 22, 2025 at 7:28 pm
  
  Aamam…always welcome! 🙂
  
The Frugal Desi says:

April 23, 2025 at 11:22 am

It’s a scary thought to know that even AI can be hoodwinked into performing malafide and potentially hazardous tasks. The only obvious solution is possibly stringent regulations and SOPs/norms governing the entire industry, globally. But that would probably mean developers losing their creative license, and end-users losing their freedom of access to information & functionality.

The call for regulations could also potentially lead to secret projects by powerful organizations and nations, or cartelization.

Tricky cul-de-sac situation, no?

Really thought-provoking post. Do drop by mine.

Cheers,
Chris (CRD)
www scriptedinsanity blogspot com

1. Jayanthi says:
  
  April 23, 2025 at 12:02 pm
  
  It is going to be difficult to regulate AI in the long run…yes, strict regulations would work but it remains to be seen if it will work permanently…thanks for stopping by…
  
A Rustic Mind says:

April 23, 2025 at 2:49 pm

This really makes you think about the complexities of AI! That line — “AI models are supposed to be non-deterministic. You may have noticed, that the AI model gives a different answer each time you ask the same question.” — perfectly captures the unpredictable nature of AI, which makes it both fascinating and a little bit terrifying!

Related

5 thoughts on “Stealthy AI jailbreaking!”

Leave a ReplyCancel reply

Stealthy AI jailbreaking!

Related

Related Posts

AI et al!

Zero to infinity with AI

5 thoughts on “Stealthy AI jailbreaking!”

Leave a ReplyCancel reply