Apriel-1.5-15b-Thinker-CYBERSEC-MERGED

This is a fully merged production model based on ServiceNow-AI/Apriel-1.5-15b-Thinker fine-tuned for cybersecurity network traffic analysis and intrusion detection.

Model Description

Developed by: Sainikhil Juluri
Model type: Vision-Language Model (LLaVA-based, 15B parameters)
Language(s): English
License: Apache 2.0
Finetuned from: ServiceNow-AI/Apriel-1.5-15b-Thinker

This model combines the power of a large vision-language model with specialized cybersecurity training using DoRA (Weight-Decomposed Low-Rank Adaptation) and RAFT (Retrieval Augmented Fine-Tuning) methodologies.

Model Type: Full Merged Model

✅ This is a complete, production-ready model with adapters fully merged into the base model. It can be used directly via Inference Endpoints or loaded with standard transformers code.

Training Details

Training Data

Dataset: NSL-KDD (Network Security Laboratory - Knowledge Discovery in Databases)

Training examples: 49,997
Data distribution:
- Normal Traffic: 53.5%
- DoS (Denial of Service): 36.5%
- Probe: 9.3%
- R2L (Remote to Local): 0.8%
- U2R (User to Root): 0.04%

Training Strategy: RAFT (Retrieval Augmented Fine-Tuning)

The model was trained using RAFT methodology with three modes:

Oracle Mode (19.9%): Learning from relevant documents
Distractor Mode (60.4%): Learning to identify irrelevant context
No Context (19.8%): Learning to generate without external context

This approach teaches the model to:

Generate responses with proper citations
Distinguish relevant from irrelevant information
Function effectively in RAG (Retrieval Augmented Generation) systems

Training Configuration

Fine-tuning Method: DoRA (Weight-Decomposed Low-Rank Adaptation)

Total Parameters: 15B (8,416,026,624)
Trainable Parameters: 275.8M (3.28% of total)
LoRA Rank: 64
LoRA Alpha: 128
Target Modules: 7 attention layers
Vision Components: Frozen (217M parameters)

Training Hyperparameters:

Epochs: 1
Training Steps: 3,125
Learning Rate: 2e-5
Batch Size (device): 4
Gradient Accumulation: 4
Effective Batch Size: 16
Optimizer: AdamW with patched implementation
Precision: 4-bit quantization (QLoRA)

Training Performance:

Initial Loss: 3.14
Final Loss: 0.038-0.092
Convergence: Excellent (97% loss reduction)
Training Duration: 12 hours
Hardware: NVIDIA A100 GPU (40GB)
Platform: Google Colab Pro

Intended Uses

Direct Use

This model is designed for:

Network traffic analysis and intrusion detection
Cybersecurity threat classification
Security incident response support
Educational purposes in cybersecurity training
RAG-based cybersecurity question answering systems

Attack Detection Capabilities

The model can identify and analyze:

DoS/DDoS attacks: Denial of Service and Distributed Denial of Service
Probe attacks: Port scanning, vulnerability scanning
R2L attacks: Remote to Local unauthorized access attempts
U2R attacks: User to Root privilege escalation
Normal traffic: Baseline network behavior

Out-of-Scope Use

⚠️ This model should NOT be used:

As the sole authority for security decisions without human oversight
For real-time critical infrastructure protection without validation
On network architectures or attack vectors not represented in NSL-KDD
For production security without thorough testing and validation

Usage

Basic Usage

from transformers import AutoModelForVision2Seq, AutoProcessor
import torch

# Load model and processor
model = AutoModelForVision2Seq.from_pretrained(
    "sainikhiljuluri2015/Apriel-1.5-15b-Thinker-CYBERSEC-MERGED",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "sainikhiljuluri2015/Apriel-1.5-15b-Thinker-CYBERSEC-MERGED",
    trust_remote_code=True
)

# Prepare conversation
messages = [
    {
        "role": "system",
        "content": "You are a cybersecurity expert specializing in network intrusion detection and analysis."
    },
    {
        "role": "user",
        "content": "Based on the provided network traffic analysis documents, identify potential security threats in this connection pattern."
    }
]

# Generate response
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    do_sample=True,
    top_p=0.95
)

response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)

RAG Integration

This model is optimized for RAG (Retrieval Augmented Generation) workflows:

# Example with document context
documents = [
    "Document 1: Network traffic shows 50 SYN packets per second...",
    "Document 2: Connection attempts from IP 192.168.1.100...",
]

context = "\n\n".join([f"Document {i+1}: {doc}" for i, doc in enumerate(documents)])

messages = [
    {"role": "system", "content": "You are a cybersecurity expert. Cite sources when analyzing."},
    {"role": "user", "content": f"Context:\n{context}\n\nQuestion: What type of attack is this?"}
]

# Model will generate response with citations

Limitations

Dataset Specificity: Trained on NSL-KDD patterns; may not generalize to all network architectures
Text-Only Training: Vision capabilities were frozen during fine-tuning
Temporal Coverage: Training data may not reflect the latest attack vectors
Citation Dependency: Trained for RAG workflows; works best with document context
Language: English only; multilingual capabilities not validated

Bias, Risks, and Limitations

Known Biases

Attack Type Imbalance: Heavy bias toward DoS (36.5%) and Normal traffic (53.5%); limited exposure to U2R attacks (0.04%)
Synthetic Data: NSL-KDD is derived from older network patterns; may not reflect modern cloud/IoT environments

Risks

False Positives/Negatives: Should not be sole arbiter of security decisions
Adversarial Robustness: Not explicitly trained against adversarial attacks
Evolving Threats: Requires continuous updating for new attack patterns

Recommendations

Users should:

✅ Use as a decision support tool alongside human expertise
✅ Validate outputs in production environments
✅ Regularly update with new threat intelligence
✅ Test thoroughly on their specific network architecture
✅ Implement proper monitoring and feedback loops

Evaluation

Training Performance

Loss Convergence: 3.14 → 0.038-0.092 (97% reduction)
Training Stability: Consistent convergence across 3,125 steps
Checkpoint Consistency: Stable performance maintained throughout training

Validation Approach

The model uses RAFT methodology which inherently validates:

Ability to identify relevant vs. irrelevant documents
Citation accuracy and source attribution
Context-aware response generation

Technical Specifications

Model Architecture

Base: Apriel-1.5-15b-Thinker (LLaVA-based architecture)
Vision Encoder: Frozen during training
Text Decoder: Fine-tuned with DoRA adapters
Precision: BFloat16
Context Length: 578 tokens (optimal for training data)

Compute Infrastructure

Hardware:

GPU: NVIDIA A100 (40GB VRAM)
Platform: Google Colab Pro
Training Time: 12 hours
Estimated Cost: ~$24 (A100 @ $1.95/hr)

Software:

Framework: HuggingFace Transformers (v4.46.0)
PEFT: v0.17.0
Training: TRL + bitsandbytes (4-bit quantization)
PyTorch: Latest stable

Environmental Impact

Estimated Carbon Emissions:

Training Duration: 12 hours on A100 GPU
Cloud Provider: Google Cloud Platform
Estimated emissions: ~5-6 kg CO2eq (based on average cloud GPU usage)

Note: This is a conservative estimate. Actual emissions depend on datacenter location and energy sources.

Citation

If you use this model in your research or applications, please cite:

@misc{apriel-cybersec-2024,
  author = {Juluri, Sainikhil},
  title = {Apriel Cybersecurity Model: DoRA + RAFT Fine-tuned for Network Intrusion Detection},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/sainikhiljuluri2015/Apriel-1.5-15b-Thinker-CYBERSEC-MERGED}}
}

Acknowledgments

Base Model: ServiceNow-AI for Apriel-1.5-15b-Thinker
Dataset: NSL-KDD (Canadian Institute for Cybersecurity)
Methodology: DoRA (Liu et al.) and RAFT (Zhang et al.)
Training Platform: Google Colab Pro

Model Card Contact

Author: Sainikhil Juluri
GitHub: [Include if public]
Email: [Include if public]
Project: Cybersecurity AI System (College Project)

For questions, issues, or collaboration opportunities, please open an issue on the model repository or contact via HuggingFace.

Last Updated: November 2025
Model Version: 1.0
Status: Production Ready ✅

Downloads last month: 38

Safetensors

Model size

15B params

Tensor type

BF16

Model tree for sainikhiljuluri2015/Apriel-1.5-15b-Thinker-CYBERSEC-MERGED

Base model

ServiceNow-AI/Apriel-1.5-15b-Thinker

Finetuned

(6)

this model