CYBERSECEVAL 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

April 18, 2024

Abstract

Large language models (LLMs) introduce new security risks, but there are few comprehensive evaluation suites to measure and reduce these risks. We present CYBERSECEVAL 2, a novel benchmark to quantify LLM security risks and capabilities. We introduce two new areas for testing: prompt injection and code interpreter abuse. We evaluated multiple state of the art (SOTA) LLMs, including GPT-4, Mistral, Meta Llama 3 70B-Instruct, and Code Llama. Our results show conditioning away risk of attack remains an unsolved problem; for example, all tested models showed between 25% and 50% successful prompt injection tests. Our code is open source and can be used to evaluate other LLMs. We further introduce the safety-utility tradeoff : conditioning an LLM to reject unsafe prompts can cause the LLM to falsely reject answering benign prompts, which lowers utility. We propose quantifying this tradeoff using False Refusal Rate (FRR). As an illustration, we introduce a novel test set to quantify FRR for cyberattack helpfulness risk. We find many LLMs able to successfully comply with “borderline” benign requests while still rejecting most unsafe requests. Finally, we quantify the utility of LLMs for automating a core cybersecurity task, that of exploiting software vulnerabilities. This is important because the offensive capabilities of LLMs are of intense interest; we quantify this by creating novel test sets for four representative problems. We find that models with coding capabilities perform better than those without, but that further work is needed for LLMs to become proficient at exploit generation. Our code is open source and can be used to evaluate other LLMs.

Download the Paper

AUTHORS

Written by

GenAI Cybersec Team

Manish Bhatt

Sahana Chennabasappa

Yue Li

Cyrus Nikolaidis

Daniel Song

Shengye Wan

Faizan Ahmad

Cornelius Aschermann

Yaohui Chen

Dhaval Kapil

David Molnar

Spencer Whitman

Joshua Saxe

Publisher

arXiv

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.