Self-Hosted PentAGI: AI Agents That Run Real Pentest Tools (Authorized Targets Only)

Your quarterly pen test quote came back at $18,000 and the report was mostly nmap output dressed in PDF. Your junior dev ran sqlmap against staging once, got scared, and never documented what they found. You don't need another ChatGPT tab that hallucinates CVE numbers — you need an agent that actually runs the tools, in a sandbox, and writes a report you can hand to management.

PentAGI (Penetration testing Artificial General Intelligence) is that ambition in open source. ~18k GitHub stars, MIT-licensed, and a multi-agent system that plans and executes security testing workflows — nmap, Metasploit, sqlmap, and 20+ other tools — inside isolated Docker containers. Web UI, REST and GraphQL APIs, vulnerability reports with exploitation notes. Controversial category? Sure. Useful for authorized security work? Also yes, if you treat it like a power tool and not a toy.

What it actually does

PentAGI is an autonomous AI penetration testing platform. You define a target scope and objective; the system breaks work into flows, tasks, and subtasks, delegates to specialized agents (research, development, execution), and runs commands in sandboxed environments.

Isolated execution. Security tools run in Docker — not on your laptop, not on production bare metal. The agent decides next steps; humans supervise through the web console and can intervene when execution monitoring flags something odd.

Built-in arsenal. Professional pentesting tools pre-integrated — port scanning, exploitation frameworks, web app testing, and more. The agent picks containers and images based on task type instead of you manually chaining shell one-liners.

Memory and knowledge graph. PostgreSQL with pgvector stores observations and embeddings. Graphiti + Neo4j tracks semantic relationships across a test — "this subdomain led to that misconfiguration" survives across sessions instead of dying in chat history.

Web intelligence. Isolated scraper browser plus search integrations (DuckDuckGo, Tavily, SearxNG, Sploitus, and others) so agents can pull current exploit intel and documentation — not just model training cutoffs.

Reporting. Flow reports viewable in the web UI, copyable, downloadable as Markdown or PDF. Detailed enough to brief a client; you still review before sending anything external.

LLM flexibility. OpenAI, Anthropic, Gemini, Bedrock, Ollama, DeepSeek, Qwen, custom OpenAI-compatible endpoints — including guides for local vLLM deployments. Point sensitive engagements at a Canadian-hosted Ollama box and keep target details off US API logs.

PentAGI vs Langfuse vs SafeLine

We've covered adjacent pieces:

  • Langfuse — LLM tracing and evals; PentAGI integrates with Langfuse so you can watch agent reasoning and token spend during a test
  • SafeLine — defensive WAF in front of your apps; PentAGI is offense-side testing (again: authorized targets only)
  • Open WebUI / Onyx — general chat and RAG; not built to run Metasploit in a sandbox

PentAGI is for security engineers who want AI-assisted execution, not just AI-assisted note-taking. It's not a BAS platform with predefined attack campaigns yet — the README is honest about that boundary.

Why self-host?

Target data stays inside your perimeter. Pentest flows include IPs, hostnames, vulnerability details, command output, and sometimes credentials from test environments. That belongs on infrastructure you control — not a vendor's multi-tenant cloud in another jurisdiction.

Local models for sensitive engagements. Wire Ollama or vLLM on the same Canadian VPS. Agent prompts and tool output never hit OpenAI's retention policy. Client NDAs and PIPEDA conversations get easier.

Full observability stack included. Grafana, VictoriaMetrics, Jaeger, Loki — watch what the agents actually did. Audit trails matter when you're explaining to a client why something was scanned.

MIT license. Inspect the code, fork if needed, run air-gapped if your policy demands it. VXControl Cloud add-ons are optional paid services — core PentAGI self-hosts without them.

Legal line — read this twice. Only aim PentAGI at systems you own or have explicit written authorization to test. Unauthorized scanning is illegal in Canada and most jurisdictions. Self-hosting doesn't change that. Treat scope documents like production firewall rules.

What running it takes

Official minimums: 2 vCPU, 4 GB RAM, 20 GB disk, Docker Compose. Realistically the full stack — PostgreSQL, Neo4j, Grafana, Langfuse components, agent workers — wants more headroom. Don't run this on the same $5 box as your blog.

mkdir pentagi && cd pentagi
wget -O installer.zip https://pentagi.com/downloads/linux/amd64/installer-latest.zip
unzip installer.zip
sudo ./installer

The interactive installer walks through LLM provider setup, search engine config, credential generation, and docker-compose deployment. Needs Docker socket access — production installs should treat the host as privileged infrastructure.

Configure providers in the web UI after login (Settings → Providers). Bearer tokens for REST/GraphQL automation live under Settings → PentAGI API. Integrate Langfuse if you want LLM-level tracing alongside system metrics in Grafana.

Who it's for (and who should skip it)

Good fit: MSSPs and internal red teams scaling repetitive recon workflows, security researchers prototyping autonomous testing in lab environments, consultants who want draft reports faster (with human review), orgs requiring Canadian data residency for engagement artifacts.

Maybe skip it: you don't have written authorization processes — fix that first. You need checkbox compliance scanning, not adaptive agents — traditional tools may fit better. You won't babysit autonomous command execution — this is not fire-and-forget against production. You're looking for a chatbot — use Open WebUI instead.

Hosting it in Canada

PentAGI is a heavy, privileged Docker deployment — isolated network segments, backups on PostgreSQL and Neo4j, strict access controls on the host. We run security tooling on Canadian Docker hosting with the understanding that this box holds sensitive test data.

Tell us your team size and whether you're running local LLMs — we'll size RAM and disk for the full agent stack, not just the minimum on the README.

Tags:
  • PentAGI
  • Security
  • AI
  • Penetration Testing
  • Self-Hosted

Need Help With Your Hosting?

Tell us about your application — we respond within 1 hour with honest recommendations.