Harmful strings

Jul 28, 2023

A paper from computer scientists at Carnegie Mellon University and the Center for AI safety offers “a simple and effective attack method that causes aligned language models to generate objectionable behaviors.” They use an adversarial model to generate “suffixes” that can be added to any prompt to override the safety precautions built into a range of LLM, including ChatGPT, LLaMA, and Bard.

Read →

Comments

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts

Internal exile

Harmful strings