Large Language Models (LLMs) are revolutionizing security, but there's a catch! A recent study reveals their potential in vulnerability scoring, yet context remains a powerful force. With over 40,000 CVEs published in 2024, security teams are overwhelmed. Can LLMs provide relief?
The study tested six LLMs, including GPT 4o, GPT 5, and Llama 3.3, on a massive CVE scoring task. The models inferred base metrics for CVSS scores, but without product names or CVE IDs, they had to rely solely on short descriptions.
Here's where it gets intriguing: When descriptions were explicit, LLMs excelled. Metrics like Attack Vector and User Interaction saw impressive accuracy, with Gemini and GPT 5 leading the way. But the models struggled with vague descriptions, especially for Availability Impact and Privileges Required.
Meta classifiers combined predictions, improving scores slightly. However, the study highlights a critical point: LLMs can assist, but they're not a silver bullet. The lack of context in short descriptions limits their effectiveness.
And this is the part that sparks debate: Could LLMs, with their vast knowledge, ever truly understand the nuances of security without human guidance? As LLMs evolve, will they become indispensable security allies or just another tool in the arsenal? Share your thoughts on this delicate balance between automation and human expertise!