SIEM Log Generator - Mistral 7B QLoRA

A fine-tuned Mistral-7B model specialized in Security Information and Event Management (SIEM) log analysis and generation. This model has been trained using QLoRA (4-bit quantization) on multiple cybersecurity log sources to understand and generate security-related event data.

Model Description

This model is a specialized variant of Mistral-7B-Instruct fine-tuned for SIEM operations, including:

  • Network traffic analysis (DDoS detection, port scanning)
  • Authentication event monitoring (credential stuffing, brute force)
  • Cloud security events (AWS CloudTrail analysis)
  • System log interpretation
  • MITRE ATT&CK framework mapping

Training Data Sources

The model was trained on a diverse set of security logs:

  • Network Logs: CICIDS2017 dataset (DDoS, PortScan patterns)
  • Authentication Logs: Risk-based authentication events
  • System Logs: Linux/Unix syslog events
  • Cloud Logs: AWS CloudTrail security events

MITRE ATT&CK Coverage

The model recognizes and maps events to MITRE ATT&CK techniques:

  • T1499: Endpoint Denial of Service (DDoS)
  • T1046: Network Service Scanning
  • T1110: Brute Force
  • T1110.004: Credential Stuffing
  • T1078.004: Cloud Account Access

Training Details

Training Configuration

  • Base Model: mistralai/Mistral-7B-Instruct-v0.2
  • Method: QLoRA (4-bit quantization with LoRA adapters)
  • LoRA Rank: 8
  • LoRA Alpha: 16
  • Target Modules: q_proj, v_proj
  • Training Samples: ~500 diverse security events
  • Batch Size: 8
  • Learning Rate: 5e-4
  • Precision: bfloat16
  • Training Steps: 50

Hardware

  • GPU: NVIDIA Tesla T4 (16GB VRAM)
  • Platform: Kaggle Notebooks
  • Training Time: ~5-10 minutes

Usage

Installation

pip install transformers peft torch bitsandbytes accelerate

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model with 4-bit quantization
base_model = "mistralai/Mistral-7B-Instruct-v0.2"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    load_in_4bit=True,
    device_map="auto"
)

# Load LoRA adapters
model = PeftModel.from_pretrained(model, "your-username/siem-log-generator-mistral-7b-qlora")
tokenizer = AutoTokenizer.from_pretrained("your-username/siem-log-generator-mistral-7b-qlora")

# Generate security event analysis
prompt = "<s>[INST] event=network attack=DDoS [/INST]"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Inference Example

# Analyze a security event
event = "timestamp=2024-01-14T10:30:00Z event=auth user=admin attack=BruteForce"
prompt = f"<s>[INST] {event} [/INST]"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        temperature=0.7,
        top_p=0.9,
        do_sample=True
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Use Cases

1. Security Event Classification

Classify incoming logs into attack types or benign traffic.

2. MITRE ATT&CK Mapping

Automatically map security events to MITRE ATT&CK framework techniques.

3. Log Enrichment

Generate additional context and metadata for security events.

4. Threat Intelligence

Analyze patterns and generate threat reports from log data.

5. Training Data Generation

Create synthetic security logs for testing SIEM systems.

Limitations

  • Training Data: Model trained on limited samples (~500) for demonstration
  • Domain Specific: Optimized for SIEM/security logs, not general purpose
  • Language: English only
  • Real-time: Not optimized for ultra-low latency applications
  • Accuracy: Should be used as an assistive tool, not sole decision-maker

Ethical Considerations

⚠️ Important Security Notice:

  • This model is for defensive cybersecurity purposes only
  • Do not use for malicious activities or unauthorized access
  • Always comply with applicable laws and regulations
  • Validate all model outputs before taking action
  • Use in conjunction with human security experts

Model Card Authors

Created by the SIEM Research Team

Citation

If you use this model in your research, please cite:

@misc{siem-log-generator-2025,
  author = {Your Name},
  title = {SIEM Log Generator - Mistral 7B QLoRA},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/your-username/siem-log-generator-mistral-7b-qlora}
}

License

This model inherits the Apache 2.0 license from Mistral-7B-Instruct-v0.2.

Acknowledgments

  • Mistral AI for the base Mistral-7B-Instruct-v0.2 model
  • CICIDS2017 dataset contributors
  • Hugging Face for the model hosting platform
  • QLoRA paper authors for the efficient fine-tuning method

Contact

For questions or issues, please open an issue on the model repository.


Note: This is a research/demonstration model. For production SIEM deployments, additional training on larger, domain-specific datasets is recommended.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sohomn/Model_trained_on_5kparams

Finetuned
(1065)
this model