A paper from computer scientists at Carnegie Mellon University and the Center for AI safety offers “a simple and effective attack method that causes aligned language models to generate objectionable behaviors.” They use an adversarial model to generate “suffixes” that can be added to any prompt to override the safety precautions built into a range of LLM, including ChatGPT, LLaMA, and Bard.
Harmful strings
Harmful strings
Harmful strings
A paper from computer scientists at Carnegie Mellon University and the Center for AI safety offers “a simple and effective attack method that causes aligned language models to generate objectionable behaviors.” They use an adversarial model to generate “suffixes” that can be added to any prompt to override the safety precautions built into a range of LLM, including ChatGPT, LLaMA, and Bard.