Info-Tech Awards

Horizon3.ai today announced that NodeZero®, its autonomous penetration testing platform, is the first AI to fully solve the Game of Active Directory (GOAD) — a respected benchmark for Active Directory exploitation — completing the challenge in just 14 minutes.

GOAD, developed by Orange Cyberdefense, simulates a realistic multi-domain enterprise network with the same trust abuses, misconfigurations, and security controls attackers exploit in the wild. Solving it requires chaining reconnaissance, credential abuse, privilege escalation, lateral movement, and persistence across multiple hosts and domains.

Recent Carnegie Mellon University research underscores how difficult this is: state-of-the-art LLMs like GPT-4o, Gemini 2.5 Pro, and Sonnet 3.7, even with advanced prompting frameworks, failed to reliably execute multi-host intrusions, capturing less than 30% of attack graph states in labs capped at 50 hosts.

Why GOAD is Hard

For both humans and algorithms, GOAD is a stress test of scale, reasoning, and persistence. Attack paths are not linear and require maintaining multi-hop memory across dozens of steps, adapting execution priorities based on partial successes, and exploiting inter-domain trust boundaries under realistic constraints.

  • For expert human pentesters: Completing GOAD typically takes 12–16 hours of sustained effort, deep AD exploitation expertise, and careful sequencing of tools and tactics.
  • For algorithms and LLMs: The complexity forces reasoning systems to juggle conditional execution, state tracking, and dynamic reprioritization — capabilities where current AI models fail.

NodeZero’s solve time of 14 minutes is 50 times faster than an expert human, with perfect execution of the full attack chain from initial foothold to complete domain compromise.

Read Also: Otto Launches AI Recap, the First Scribe Tool that is Built for the Whole Picture of Veterinary Care