Apriel-1.5-15b-Thinker-CYBERSEC-MERGED
This is a fully merged production model based on ServiceNow-AI/Apriel-1.5-15b-Thinker fine-tuned for cybersecurity network traffic analysis and intrusion detection.
Model Description
Developed by: Sainikhil Juluri
Model type: Vision-Language Model (LLaVA-based, 15B parameters)
Language(s): English
License: Apache 2.0
Finetuned from: ServiceNow-AI/Apriel-1.5-15b-Thinker
This model combines the power of a large vision-language model with specialized cybersecurity training using DoRA (Weight-Decomposed Low-Rank Adaptation) and RAFT (Retrieval Augmented Fine-Tuning) methodologies.
Model Type: Full Merged Model
✅ This is a complete, production-ready model with adapters fully merged into the base model. It can be used directly via Inference Endpoints or loaded with standard transformers code.
Training Details
Training Data
Dataset: NSL-KDD (Network Security Laboratory - Knowledge Discovery in Databases)
- Training examples: 49,997
- Data distribution:
- Normal Traffic: 53.5%
- DoS (Denial of Service): 36.5%
- Probe: 9.3%
- R2L (Remote to Local): 0.8%
- U2R (User to Root): 0.04%
Training Strategy: RAFT (Retrieval Augmented Fine-Tuning)
The model was trained using RAFT methodology with three modes:
- Oracle Mode (19.9%): Learning from relevant documents
- Distractor Mode (60.4%): Learning to identify irrelevant context
- No Context (19.8%): Learning to generate without external context
This approach teaches the model to:
- Generate responses with proper citations
- Distinguish relevant from irrelevant information
- Function effectively in RAG (Retrieval Augmented Generation) systems
Training Configuration
Fine-tuning Method: DoRA (Weight-Decomposed Low-Rank Adaptation)
- Total Parameters: 15B (8,416,026,624)
- Trainable Parameters: 275.8M (3.28% of total)
- LoRA Rank: 64
- LoRA Alpha: 128
- Target Modules: 7 attention layers
- Vision Components: Frozen (217M parameters)
Training Hyperparameters:
- Epochs: 1
- Training Steps: 3,125
- Learning Rate: 2e-5
- Batch Size (device): 4
- Gradient Accumulation: 4
- Effective Batch Size: 16
- Optimizer: AdamW with patched implementation
- Precision: 4-bit quantization (QLoRA)
Training Performance:
- Initial Loss: 3.14
- Final Loss: 0.038-0.092
- Convergence: Excellent (97% loss reduction)
- Training Duration: 12 hours
- Hardware: NVIDIA A100 GPU (40GB)
- Platform: Google Colab Pro
Intended Uses
Direct Use
This model is designed for:
- Network traffic analysis and intrusion detection
- Cybersecurity threat classification
- Security incident response support
- Educational purposes in cybersecurity training
- RAG-based cybersecurity question answering systems
Attack Detection Capabilities
The model can identify and analyze:
- DoS/DDoS attacks: Denial of Service and Distributed Denial of Service
- Probe attacks: Port scanning, vulnerability scanning
- R2L attacks: Remote to Local unauthorized access attempts
- U2R attacks: User to Root privilege escalation
- Normal traffic: Baseline network behavior
Out-of-Scope Use
⚠️ This model should NOT be used:
- As the sole authority for security decisions without human oversight
- For real-time critical infrastructure protection without validation
- On network architectures or attack vectors not represented in NSL-KDD
- For production security without thorough testing and validation
Usage
Basic Usage
from transformers import AutoModelForVision2Seq, AutoProcessor
import torch
# Load model and processor
model = AutoModelForVision2Seq.from_pretrained(
"sainikhiljuluri2015/Apriel-1.5-15b-Thinker-CYBERSEC-MERGED",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto"
)
processor = AutoProcessor.from_pretrained(
"sainikhiljuluri2015/Apriel-1.5-15b-Thinker-CYBERSEC-MERGED",
trust_remote_code=True
)
# Prepare conversation
messages = [
{
"role": "system",
"content": "You are a cybersecurity expert specializing in network intrusion detection and analysis."
},
{
"role": "user",
"content": "Based on the provided network traffic analysis documents, identify potential security threats in this connection pattern."
}
]
# Generate response
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
do_sample=True,
top_p=0.95
)
response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)
RAG Integration
This model is optimized for RAG (Retrieval Augmented Generation) workflows:
# Example with document context
documents = [
"Document 1: Network traffic shows 50 SYN packets per second...",
"Document 2: Connection attempts from IP 192.168.1.100...",
]
context = "\n\n".join([f"Document {i+1}: {doc}" for i, doc in enumerate(documents)])
messages = [
{"role": "system", "content": "You are a cybersecurity expert. Cite sources when analyzing."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: What type of attack is this?"}
]
# Model will generate response with citations
Limitations
- Dataset Specificity: Trained on NSL-KDD patterns; may not generalize to all network architectures
- Text-Only Training: Vision capabilities were frozen during fine-tuning
- Temporal Coverage: Training data may not reflect the latest attack vectors
- Citation Dependency: Trained for RAG workflows; works best with document context
- Language: English only; multilingual capabilities not validated
Bias, Risks, and Limitations
Known Biases
- Attack Type Imbalance: Heavy bias toward DoS (36.5%) and Normal traffic (53.5%); limited exposure to U2R attacks (0.04%)
- Synthetic Data: NSL-KDD is derived from older network patterns; may not reflect modern cloud/IoT environments
Risks
- False Positives/Negatives: Should not be sole arbiter of security decisions
- Adversarial Robustness: Not explicitly trained against adversarial attacks
- Evolving Threats: Requires continuous updating for new attack patterns
Recommendations
Users should:
- ✅ Use as a decision support tool alongside human expertise
- ✅ Validate outputs in production environments
- ✅ Regularly update with new threat intelligence
- ✅ Test thoroughly on their specific network architecture
- ✅ Implement proper monitoring and feedback loops
Evaluation
Training Performance
- Loss Convergence: 3.14 → 0.038-0.092 (97% reduction)
- Training Stability: Consistent convergence across 3,125 steps
- Checkpoint Consistency: Stable performance maintained throughout training
Validation Approach
The model uses RAFT methodology which inherently validates:
- Ability to identify relevant vs. irrelevant documents
- Citation accuracy and source attribution
- Context-aware response generation
Technical Specifications
Model Architecture
- Base: Apriel-1.5-15b-Thinker (LLaVA-based architecture)
- Vision Encoder: Frozen during training
- Text Decoder: Fine-tuned with DoRA adapters
- Precision: BFloat16
- Context Length: 578 tokens (optimal for training data)
Compute Infrastructure
Hardware:
- GPU: NVIDIA A100 (40GB VRAM)
- Platform: Google Colab Pro
- Training Time: 12 hours
- Estimated Cost: ~$24 (A100 @ $1.95/hr)
Software:
- Framework: HuggingFace Transformers (v4.46.0)
- PEFT: v0.17.0
- Training: TRL + bitsandbytes (4-bit quantization)
- PyTorch: Latest stable
Environmental Impact
Estimated Carbon Emissions:
- Training Duration: 12 hours on A100 GPU
- Cloud Provider: Google Cloud Platform
- Estimated emissions: ~5-6 kg CO2eq (based on average cloud GPU usage)
Note: This is a conservative estimate. Actual emissions depend on datacenter location and energy sources.
Citation
If you use this model in your research or applications, please cite:
@misc{apriel-cybersec-2024,
author = {Juluri, Sainikhil},
title = {Apriel Cybersecurity Model: DoRA + RAFT Fine-tuned for Network Intrusion Detection},
year = {2024},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/sainikhiljuluri2015/Apriel-1.5-15b-Thinker-CYBERSEC-MERGED}}
}
Acknowledgments
- Base Model: ServiceNow-AI for Apriel-1.5-15b-Thinker
- Dataset: NSL-KDD (Canadian Institute for Cybersecurity)
- Methodology: DoRA (Liu et al.) and RAFT (Zhang et al.)
- Training Platform: Google Colab Pro
Model Card Contact
Author: Sainikhil Juluri
GitHub: [Include if public]
Email: [Include if public]
Project: Cybersecurity AI System (College Project)
For questions, issues, or collaboration opportunities, please open an issue on the model repository or contact via HuggingFace.
Last Updated: November 2025
Model Version: 1.0
Status: Production Ready ✅
- Downloads last month
- 38
Model tree for sainikhiljuluri2015/Apriel-1.5-15b-Thinker-CYBERSEC-MERGED
Base model
ServiceNow-AI/Apriel-1.5-15b-Thinker