VEXT Pentest-7B -- The First Open-Source Security AI Model

Pentest-7B is a 7-billion-parameter language model fine-tuned specifically for offensive security, penetration testing, and vulnerability analysis. Built by VEXT Labs on top of Qwen2.5-7B-Instruct and trained on 260,000+ curated security examples drawn from real-world engagements, this is the first open-weight model purpose-built for the security profession.

Pentest-7B runs on a single consumer GPU (16 GB VRAM), a MacBook with 16 GB RAM via Ollama, or CPU-only with quantized weights. No API keys, no cloud dependency, no data leaves your machine.

Key Capabilities

Capability Description
Vulnerability Explanation Given a CVE ID, CWE, or raw scan output, produce a clear technical explanation of the vulnerability, its root cause, and real-world impact.
Pentest Report Writing Generate executive summaries, technical finding write-ups, risk ratings, and remediation sections in standard pentest report format.
Attack Strategy Planning Given a target technology stack, suggest prioritized attack paths aligned with MITRE ATT&CK and OWASP Testing Guide methodologies.
Remediation Guidance Provide specific, actionable fix recommendations with code examples for common vulnerability classes.
Compliance Assessment Map findings to compliance frameworks (PCI DSS, SOC 2, HIPAA, ISO 27001) and articulate control gaps.
Threat Briefing Summarize threat intelligence, emerging CVEs, and APT campaign TTPs for stakeholder communication.
Security Code Review Analyze code snippets for injection flaws, authentication bypasses, insecure deserialization, and other OWASP Top 10 issues.

Training

Data

Pentest-7B was trained on 260,000+ curated examples spanning:

  • Production pentesting traces -- Real (anonymized) action-observation pairs from VEXT's autonomous security agents running against authorized bug bounty targets. Includes successful exploitation chains, false positive patterns, and tool output interpretation.
  • CTF challenge solutions -- Structured walkthroughs from capture-the-flag competitions covering web, pwn, crypto, reverse engineering, and forensics categories.
  • Bug bounty write-ups -- Public responsible disclosure reports with structured vulnerability descriptions, reproduction steps, and impact assessments.
  • MITRE ATT&CK corpus -- Technique descriptions, procedure examples, detection guidance, and mitigation strategies across all 14 tactics.
  • OWASP materials -- Testing Guide procedures, ASVS requirements, cheat sheets, and vulnerability classifications.
  • CVE analysis -- Detailed analysis of 50,000+ CVEs including root cause, affected versions, exploit conditions, and patch diffs.
  • DPO preference pairs -- 2,000+ pairs where validated real findings are preferred over false positives, teaching the model to distinguish true vulnerabilities from noise.

What is NOT in the training data: Raw exploit code, weaponized payloads, malware source, credentials, PII, or any data that could be directly used for unauthorized access. The model is trained to reason about security, not to serve as an exploit toolkit.

Architecture and Training Pipeline

Qwen2.5-7B-Instruct (base)
    |
    v
QLoRA Fine-Tuning (SFT)
  - Rank: 16, Alpha: 32
  - Target modules: q_proj, k_proj, v_proj, o_proj
  - 3 epochs, effective batch size 32
  - Max sequence length: 4096 tokens
  - Learning rate: 2e-4, cosine schedule
    |
    v
DPO Alignment
  - Beta: 0.1, sigmoid loss
  - 1 epoch, learning rate 5e-6
  - Preference signal: validated findings (chosen) vs false positives (rejected)
    |
    v
Adapter Merge + AWQ 4-bit Quantization (optional)
    |
    v
VEXT Pentest-7B

Hardware

  • SFT: 8x NVIDIA A100 40GB (SageMaker ml.p4d.24xlarge), ~18 hours
  • DPO: 8x NVIDIA A100 40GB, ~4 hours
  • Quantization: Single A10G 24GB (SageMaker ml.g5.2xlarge)

Usage

Transformers (Full Precision)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "vext-labs/pentest-7b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {
        "role": "system",
        "content": (
            "You are an expert penetration tester and security analyst. "
            "Provide detailed, technically accurate security guidance."
        ),
    },
    {
        "role": "user",
        "content": (
            "I found a reflected XSS in a search parameter on an e-commerce site "
            "during a bug bounty engagement. The input is reflected inside a "
            "JavaScript string literal in the response. Write the finding for my "
            "pentest report, including severity rating, impact, and remediation."
        ),
    },
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.1,
)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(response)

vLLM (Production Serving)

from vllm import LLM, SamplingParams

llm = LLM(
    model="vext-labs/pentest-7b",
    tensor_parallel_size=1,       # single GPU
    max_model_len=4096,
    gpu_memory_utilization=0.90,
)

sampling = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=1024)

prompts = [
    "Explain CVE-2024-3094 (XZ Utils backdoor) — root cause, impact, and detection methods.",
    "Given an exposed .git directory on a production web server, outline an attack plan.",
]

outputs = llm.generate(prompts, sampling)
for output in outputs:
    print(output.outputs[0].text)

OpenAI-compatible API with vLLM:

vllm serve vext-labs/pentest-7b --port 8000
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")

response = client.chat.completions.create(
    model="vext-labs/pentest-7b",
    messages=[
        {"role": "system", "content": "You are a senior penetration tester."},
        {"role": "user", "content": "Analyze this Nmap output and suggest next steps:\n\nPORT    STATE SERVICE  VERSION\n22/tcp  open  ssh      OpenSSH 7.4\n80/tcp  open  http     Apache 2.4.6\n443/tcp open  ssl/http Apache 2.4.6\n3306/tcp open mysql    MySQL 5.7.38"},
    ],
    temperature=0.7,
    max_tokens=1024,
)
print(response.choices[0].message.content)

Ollama (Local, Quantized)

# Pull the model (GGUF Q4_K_M quantization, ~4.5 GB)
ollama pull vext-labs/pentest-7b

# Interactive chat
ollama run vext-labs/pentest-7b

# API
curl http://localhost:11434/api/chat -d '{
  "model": "vext-labs/pentest-7b",
  "messages": [
    {"role": "user", "content": "What are the top 5 things to check when auditing a JWT implementation?"}
  ]
}'

Docker (Isolated Serving)

docker run --gpus all -p 8000:8000 \
  ghcr.io/vext-labs/pentest-7b:latest \
  --model vext-labs/pentest-7b --port 8000

Telemetry

Pentest-7B includes an opt-in telemetry collector to help us improve the model. It is off by default and collects only anonymized aggregate statistics (vulnerability categories, tool success rates, session metadata). It never collects URLs, IPs, credentials, vulnerability details, request/response bodies, file paths, or user identity.

# Enable (opt-in)
export VEXT_TELEMETRY=on

# Disable (default)
export VEXT_TELEMETRY=off

# See exactly what is collected
python -c "from vext_telemetry import what_we_collect; what_we_collect()"

Full telemetry source code is included in the repository for audit: telemetry/collector.py.

Evaluation

Benchmark Pentest-7B Qwen2.5-7B-Instruct (base) GPT-4o (API)
SecBench (vuln classification) 82.4% 61.2% 79.8%
CyberMetric (security knowledge) 74.1% 52.7% 71.3%
PentestQA (methodology) 88.6% 44.3% 83.1%
Finding Quality (human eval, 1-5) 4.2 2.1 4.4
False Positive Rate 12.3% 41.7% 15.8%

Benchmarks run with temperature=0, greedy decoding. Human evaluation by 3 senior pentesters on 200 randomly sampled findings.

Intended Use

This model is built for authorized security professionals:

  • Penetration testers writing reports and planning engagements
  • Bug bounty hunters analyzing targets and drafting submissions
  • Security engineers triaging vulnerabilities and planning remediation
  • SOC analysts interpreting alerts and assessing threat severity
  • Compliance teams mapping findings to regulatory frameworks
  • Security researchers studying vulnerability patterns

Limitations and Responsible Use

  • Not a replacement for human expertise. Always validate model outputs with manual testing and professional judgment.
  • Authorization required. Do not use this model's output to test systems without explicit written authorization from the system owner.
  • No guarantee of accuracy. The model can hallucinate CVE details, suggest inapplicable techniques, or miss critical context. Treat outputs as a starting point, not a final answer.
  • Scope of training. The model is strongest on web application security, network infrastructure, and common vulnerability classes. It has limited depth on hardware security, ICS/SCADA, mobile reversing, and cryptographic implementation review.
  • Not an exploit generator. The model is trained to reason about security concepts, not to produce weaponized code. Attempts to extract raw exploit payloads will produce lower-quality outputs by design.

License

Apache 2.0. Use it, modify it, deploy it commercially. Attribution appreciated but not required.

Citation

@misc{vext-pentest-7b-2026,
  title   = {VEXT Pentest-7B: An Open-Source Language Model for Penetration Testing and Security Analysis},
  author  = {VEXT Labs},
  year    = {2026},
  url     = {https://huggingface.co/vext-labs/pentest-7b},
  note    = {Fine-tuned from Qwen2.5-7B-Instruct on 260K+ curated security examples with QLoRA SFT and DPO alignment},
}

Links

Built By

VEXT Labs, Inc. -- Building autonomous security testing infrastructure. Pentest-7B is the open-source foundation of our platform's security reasoning capabilities.


If you use Pentest-7B in your research or product, we would love to hear about it. Open an issue or reach out at oss@tryvext.com.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support