Sitemap

AI Agents in Cybersecurity: Challenges and Use Cases

8 min readJun 21, 2025

--

https://www.cybergym.io/

CyberGym is a comprehensive framework designed to evaluate the cybersecurity capabilities of AI agents by testing them against real-world vulnerabilities from 188 large software projects, encompassing 1,507 benchmark instances. These instances are derived from vulnerabilities discovered and patched through Google’s OSS-Fuzz.

The evaluation framework allows AI agents to reproduce known vulnerabilities by generating proof-of-concept (PoC) exploits against unpatched codebases. It also assesses their ability to discover new vulnerabilities by identifying crashes in post-patch executables.

Key findings from using CyberGym include:

  • Leading Agents: Agents like OpenHands paired with models such as Claude-Sonnet-4 and Claude-3.7-Sonnet demonstrated the highest success rates in reproducing target vulnerabilities and finding new ones.
  • Zero-Day Discoveries: Through further experiments, AI agents successfully identified 15 zero-day vulnerabilities and 2 previously disclosed but unpatched vulnerabilities across various software projects. These involved common vulnerability types like out-of-bounds reads/writes, null pointer dereferences, and stack overflows.
  • “Thinking Mode” Impact: Enabling a “thinking” mode for AI agents provided modest improvements (2–3%) in reproducing vulnerabilities but did not consistently enhance the discovery of post-patch vulnerabilities.
  • Input Richness: Providing richer input information, such as stack traces and ground truth patches, significantly boosted agents’ success in reproducing vulnerabilities.
  • Longer PoCs are Challenging: Agents struggled with generating effective PoCs for more complex vulnerabilities that required longer input sequences, indicating a limitation in their ability to handle intricate input parsing logic.
  • Early Successes: Agents were more likely to solve simpler vulnerability reproduction tasks within the initial stages of their iterative process, while complex cases often led to failures in later steps.

The CyberGym framework aims to systematically assess and track the evolving capabilities of AI in cybersecurity, particularly in the critical areas of vulnerability analysis and discovery.

An Example of Successful Agent Trace

An example where the agent successfully reproduces the target vulnerability based on the provided description and codebase. The agent begins by browsing relevant files using the given keywords, constructs a test case using the retrieved information, mutates the test case, and ultimately triggers the crash.

https://github.com/sunblaze-ucb/cybergym
https://www.cve.org/
https://developer.nvidia.com/blog/applying-generative-ai-for-cve-analysis-at-an-enterprise-scale/

This blog post discusses how NVIDIA is using generative AI, specifically an AI application called “Agent Morpheus,” to improve the process of analyzing and managing software vulnerabilities (CVEs) at an enterprise scale.

The article highlights the increasing complexity of modern software and the overwhelming number of reported CVEs, making traditional scanning and patching methods unmanageable. Agent Morpheus aims to address this by automating a significant portion of the CVE analysis process.

Key aspects of Agent Morpheus include:

  • Beyond Simple Scanning: It doesn’t just identify CVEs; it investigates whether a vulnerability is truly exploitable within a specific software context. This is crucial because not all CVEs pose a real risk due to factors like missing dependencies or unused code.
  • AI Agents and Retrieval-Augmented Generation (RAG): It uses AI agents and RAG to gather information from various sources like vulnerability databases, threat intelligence, source code, and Software Bills of Materials (SBOMs).
  • Automated Investigation Workflow: Agent Morpheus generates a task-based checklist for each CVE and uses an AI agent to execute these tasks autonomously, reducing the need for constant human intervention.
  • Contextual Analysis: It leverages LLMs (specifically Llama3) fine-tuned for tasks like planning, execution, summarization, and generating justifications in the VEX format for non-exploitable vulnerabilities.
  • Efficiency and Speed: By processing tasks in parallel, Agent Morpheus can significantly reduce the time it takes to analyze containers for vulnerabilities, from hours or days to seconds.
  • Continuous Improvement: Analyst feedback on Agent Morpheus’s summaries and recommendations is fed back into the LLM training datasets to continuously improve the system.
  • NVIDIA NIM: The system utilizes NVIDIA NIM (NVIDIA Inference Microservices) for accelerated AI model inference, handling a large volume of LLM requests efficiently.

In essence, Agent Morpheus transforms the laborious, manual CVE analysis process into an automated, intelligent workflow, enabling organizations to more effectively manage software security risks and accelerate software delivery.

https://www.crowdstrike.com/en-us/blog/crowdstrike-launches-agentic-ai-innovations/

CrowdStrike has launched new “agentic AI” innovations, Charlotte AI Agentic Response and Charlotte AI Agentic Workflows, to enhance AI-native Security Operations Centers (SOCs). These advancements aim to provide autonomous investigation and response capabilities.

Key features and benefits include:

  • Agentic AI: Charlotte AI goes beyond traditional AI copilots by independently analyzing data, drawing conclusions, and taking authorized, controlled actions within predefined boundaries. It is trained on real-world SOC decisions from CrowdStrike Falcon Complete Next-Gen MDR.
  • Charlotte AI Agentic Response: This feature autonomously generates and answers investigative questions to assist security teams with root cause analysis and incident response. It helps analysts ask the right questions and focus on crucial information.
  • Charlotte AI Agentic Workflows: Integrated with CrowdStrike Falcon Fusion SOAR, this allows customers to embed large language models (LLMs) directly into workflows. This automates complex tasks, handles unstructured data, and generates tailored outputs without human intervention, overcoming limitations of traditional SOAR tools.
  • Falcon Complete Next-Gen MDR Integration: CrowdStrike’s managed detection and response service now utilizes Charlotte AI for alert triage and accelerated analysis, combining human expertise with AI.
  • Addressing Adversarial AI: The innovations are designed to counter the increasing speed and scale of AI-powered attacks, enabling security teams to respond more effectively.

These advancements aim to empower security analysts by automating repetitive tasks, providing faster and more accurate insights, and allowing them to focus on critical threats.

CrowdStrike’s Charlotte AI Detection Triage helps overwhelmed security teams by filtering out false positives and prioritizing real threats.

Key benefits highlighted are:

  • Noise Reduction: Charlotte AI handles the “noise” of alerts, allowing human analysts to focus on critical threats.
  • Prioritization: It assigns an escalation priority score to detections, enabling quick identification of critical threats.
  • Detailed Insights: A “full details panel” provides a comprehensive breakdown of each detection, including verdict, confidence level, and triage status, with plain English explanations and context for decision-making.
  • Actionable Recommendations: Charlotte AI offers clear recommendations on how to handle detections (e.g., escalate or close), which can further automate response workflows.

Ultimately, Charlotte AI Detection Triage aims to streamline security operations, boost efficiency, and enable faster, more precise threat resolution.

CrowdStrike’s Charlotte AI agentic workflows are being used to automate the analysis of data egress patterns, a critical task for identifying sensitive data leaving an organization.

Here’s how it works:

  • Workflow Trigger: A weekly workflow is initiated in Falcon Fusion to scan seven days of Falcon Data Protection events.
  • Data Collection: An advanced event search gathers file movement activity from endpoints, collecting details like usernames, filenames, classifications, devices, and actions taken.
  • AI Analysis: A foundational model is fed this data, and it’s prompted to identify anomalies, risks, and suspicious patterns. This includes looking for red flags such as files exfiltrated after hours, transfers to external domains, or sensitive data moved in ways that violate company policies.
  • Structured Reporting: The workflow then formats the findings into a specific report structure, including risk assessment, key findings, notable events, pattern analysis, and actionable recommendations.
  • Automated Delivery: This report is automatically sent to the user’s inbox.

In essence, this application of agentic AI leverages intelligent reasoning and automation to proactively detect and report on potential data exfiltration risks, enabling organizations to stay ahead of threats.

https://reliaquest.com/security-operations-platform/

ReliaQuest’s GreyMatter is an AI-powered platform designed to simplify and enhance Security Operations (SecOps). It achieves this by unifying data from fragmented security tools and leveraging autonomous, agentic AI to accelerate processes from threat prevention to response.

Key functionalities and benefits of GreyMatter include:

  • Connected Security Tools: It integrates with existing security technologies, including those from partners like CrowdStrike, AWS, and Splunk, to provide a unified view and enhance visibility.
  • Prevention: Offers Digital Risk Protection for brand safeguarding, AI-driven asset discovery and risk prioritization (GreyMatter Discover), and tailored threat intelligence.
  • Detection: Aims to detect threats at their source, bypassing SIEM bottlenecks, and allows for single-click deployment and continuous validation of detections across various tools.
  • Containment: Automates threat containment within 5 minutes or less, significantly faster than most attacker lateral movement, with pre-built playbooks and the ability to isolate hosts, block IPs, or ban hashes.
  • Investigation: Utilizes a multi-agentic AI system for hands-free investigation and response, offering natural language processing for insights and pre-packaged threat hunts. It also includes an AI-powered Phishing Analyzer.
  • Response: Provides configurable, no-code automation workflows for repetitive tasks and allows for response actions directly from a mobile app, with real-time insights delivered via a notification center.

ReliaQuest highlights significant improvements for its customers, such as reductions in false-positive alerts and incident resolution time, alongside an increase in MITRE ATT&CK coverage. The platform is praised for providing “strategic flexibility” and enabling teams to “see everything happening on one platform” for quicker threat identification and response.

https://www.twinesecurity.com/

Coming out of stealth, cybersecurity startup Twine announced today $12 million in seed funding, co-led by Ten Eleven Ventures and Dell Technologies Capital, with participation from angel investors including the founders of Wiz. Twine plans to address cybersecurity’s critical talent shortage by developing AI agents or “digital employees” to augment companies’ security teams. Alex, Twine’s first digital employee, is an expert in identity and access management or IAM.

Alex is deployed as a SaaS platform, connecting to different systems within the customer’s environment. “The user interacts with the Alex interface in order to ask him questions or assign tasks,” explains Benny Porat, Twine’s co-founder and CEO. “For any task assigned, Alex creates a plan, seeks approval, provides full visibility, and proceeds with an A-to-Z execution of the plan.”

--

--

noailabs
noailabs

Written by noailabs

Tech/biz consulting, analytics, research for founders, startups, corps and govs.

No responses yet