skip to main content
Tipo de recurso Mostra resultados com: Mostra resultados com: Índice

Do Neutral Prompts Produce Insecure Code? FormAI-v2 Dataset: Labelling Vulnerabilities in Code Generated by Large Language Models

Tihanyi, Norbert ; Bisztray, Tamas ; Ferrag, Mohamed Amine ; Jain, Ridhi ; Cordeiro, Lucas C

arXiv.org, 2024-04

Ithaca: Cornell University Library, arXiv.org

Texto completo disponível

Citações Citado por
  • Título:
    Do Neutral Prompts Produce Insecure Code? FormAI-v2 Dataset: Labelling Vulnerabilities in Code Generated by Large Language Models
  • Autor: Tihanyi, Norbert ; Bisztray, Tamas ; Ferrag, Mohamed Amine ; Jain, Ridhi ; Cordeiro, Lucas C
  • Assuntos: Artificial intelligence ; Computer Science - Artificial Intelligence ; Computer Science - Cryptography and Security ; Computer Science - Programming Languages ; Datasets ; Large language models ; Mathematical models ; Parameters ; Risk assessment ; Source code ; Verification
  • É parte de: arXiv.org, 2024-04
  • Descrição: This study provides a comparative analysis of state-of-the-art large language models (LLMs), analyzing how likely they generate vulnerabilities when writing simple C programs using a neutral zero-shot prompt. We address a significant gap in the literature concerning the security properties of code produced by these models without specific directives. N. Tihanyi et al. introduced the FormAI dataset at PROMISE '23, containing 112,000 GPT-3.5-generated C programs, with over 51.24% identified as vulnerable. We expand that work by introducing the FormAI-v2 dataset comprising 265,000 compilable C programs generated using various LLMs, including robust models such as Google's GEMINI-pro, OpenAI's GPT-4, and TII's 180 billion-parameter Falcon, to Meta's specialized 13 billion-parameter CodeLLama2 and various other compact models. Each program in the dataset is labelled based on the vulnerabilities detected in its source code through formal verification using the Efficient SMT-based Context-Bounded Model Checker (ESBMC). This technique eliminates false positives by delivering a counterexample and ensures the exclusion of false negatives by completing the verification process. Our study reveals that at least 63.47% of the generated programs are vulnerable. The differences between the models are minor, as they all display similar coding errors with slight variations. Our research highlights that while LLMs offer promising capabilities for code generation, deploying their output in a production environment requires risk assessment and validation.
  • Editor: Ithaca: Cornell University Library, arXiv.org
  • Idioma: Inglês

Buscando em bases de dados remotas. Favor aguardar.