Dataset

Forbidden Question Dataset

Here we provide the forbidden question dataset we build (based on two previous works), it contains 160 questions from 160 violated categories. In addition, we also provide the corresponding targets – which is useful for some jailbreak methods, such as GCG.

If you find them useful, please consider cite the following:

@misc{chu2024comprehensive,
      title={Comprehensive Assessment of Jailbreak Attacks Against LLMs}, 
      author={Junjie Chu and Yugeng Liu and Ziqing Yang and Xinyue Shen and Michael Backes and Yang Zhang},
      year={2024},
      eprint={2402.05668},
      archivePrefix={arXiv},
      primaryClass={cs.CR}
}

@misc{shen2023do,
      title={"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models}, 
      author={Xinyue Shen and Zeyuan Chen and Michael Backes and Yun Shen and Yang Zhang},
      year={2023},
      eprint={2308.03825},
      archivePrefix={arXiv},
      primaryClass={cs.CR}
}

@misc{zou2023universal,
      title={Universal and Transferable Adversarial Attacks on Aligned Language Models}, 
      author={Andy Zou and Zifan Wang and Nicholas Carlini and Milad Nasr and J. Zico Kolter and Matt Fredrikson},
      year={2023},
      eprint={2307.15043},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}