|
--- |
|
license: mit |
|
base_model: unsloth/Meta-Llama-3.1-8B-Instruct-unsloth-bnb-4bit |
|
tags: |
|
- cybersecurity |
|
- mitre-attack |
|
- honeypot |
|
- log-analysis |
|
- llama |
|
- lora |
|
- security |
|
- threat-detection |
|
language: |
|
- en |
|
datasets: |
|
- custom |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# LLM-Enhanced Honeypot Log Analysis Model |
|
|
|
## Model Description |
|
|
|
This model is a fine-tuned version of Llama 3.1 8B Instruct, specialized for analyzing honeypot logs and generating MITRE ATT&CK framework annotations. It was developed as part of a research project at Queen's University Belfast investigating automated security log analysis using Large Language Models. |
|
|
|
## Key Features |
|
|
|
- **MITRE ATT&CK Annotation**: Automatically generates structured annotations for security events |
|
- **Honeypot Log Analysis**: Specialized in analyzing Unix terminal logs from honeypot systems |
|
- **LoRA Fine-tuning**: Uses Low-Rank Adaptation for efficient parameter updates |
|
- **Research-Grade**: Developed for academic research in cybersecurity and AI |
|
|
|
## Model Details |
|
|
|
### Base Model |
|
- **Base Model**: unsloth/Meta-Llama-3.1-8B-Instruct-unsloth-bnb-4bit |
|
- **Model Size**: 8B parameters |
|
- **Architecture**: Llama 3.1 with Instruct tuning |
|
- **Quantization**: 4-bit quantization for efficiency |
|
|
|
### Fine-tuning Details |
|
- **Method**: LoRA (Low-Rank Adaptation) |
|
- **LoRA Rank**: 32 |
|
- **LoRA Alpha**: 32 |
|
- **LoRA Dropout**: 0 |
|
- **Learning Rate**: 0.00012 |
|
- **Batch Size**: 2 |
|
- **Gradient Accumulation**: 4 |
|
- **Max Steps**: 100 |
|
- **Optimizer**: adamw_8bit |
|
|
|
## Training Data |
|
|
|
The model was trained on a curated dataset of honeypot logs with human-annotated MITRE ATT&CK framework labels. The training data includes: |
|
|
|
- Unix terminal command logs from honeypot systems |
|
- Structured annotations for 6 key MITRE ATT&CK fields |
|
- Balanced representation of different attack tactics and techniques |
|
|
|
## Usage |
|
|
|
### Installation |
|
|
|
```bash |
|
pip install transformers torch unsloth |
|
``` |
|
|
|
### Loading the Model |
|
|
|
```python |
|
from unsloth import FastLanguageModel |
|
|
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
model_name="your-username/model-name", |
|
max_seq_length=2048, |
|
dtype=None, |
|
load_in_4bit=True, |
|
) |
|
``` |
|
|
|
### Inference |
|
|
|
```python |
|
# Enable inference mode |
|
FastLanguageModel.for_inference(model) |
|
|
|
# Format your input |
|
prompt = '''Below is a Unix terminal command log from a honeypot system. Please analyze it and provide MITRE ATT&CK framework annotations. |
|
|
|
Command: {command} |
|
Timestamp: {timestamp} |
|
Source IP: {source_ip} |
|
|
|
Please provide: |
|
1. Tactic |
|
2. Technique |
|
3. Sub-technique |
|
4. Description' |
|
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7) |
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
``` |
|
|
|
## Evaluation |
|
|
|
The model has been evaluated on multiple metrics: |
|
|
|
- **Overall MITRE Accuracy**: Novel composite metric combining all 6 MITRE ATT&CK field accuracies |
|
- **Confusion Matrix Analysis**: Visual analysis of tactics classification performance |
|
- **Field-level Accuracy**: Individual accuracy for each MITRE ATT&CK field |
|
- **Human Evaluation**: Expert validation of generated annotations |
|
|
|
## Limitations |
|
|
|
- Specialized for honeypot log analysis - may not generalize to other security contexts |
|
- Requires structured input format for optimal performance |
|
- Training data limited to specific honeypot configurations |
|
- May exhibit biases present in training data |
|
|
|
## Ethical Considerations |
|
|
|
This model is designed for defensive cybersecurity research and should be used responsibly: |
|
|
|
- Intended for legitimate security research and defense applications |
|
- Should not be used for malicious purposes or unauthorized access |
|
- Users should validate outputs before making security decisions |
|
- Consider privacy implications when analyzing logs |
|
|
|
## Citation |
|
|
|
If you use this model in your research, please cite: |
|
|
|
```bibtex |
|
@misc{llm_honeypot_analysis_2025, |
|
title={LLM-Enhanced Honeypot Log Analysis System}, |
|
author={[Student Name]}, |
|
year={2025}, |
|
institution={Queen's University Belfast}, |
|
course={CSC4003 - Research Project}, |
|
url={https://gitlab.eeecs.qub.ac.uk/[student-id]/CSC4003} |
|
} |
|
``` |
|
|
|
## License |
|
|
|
This model is released under the MIT License. See the LICENSE file for details. |
|
|
|
## Contact |
|
|
|
For questions or issues: |
|
- Repository: https://gitlab.eeecs.qub.ac.uk/40285272/CSC4006 |
|
- Institution: Queen's University Belfast |
|
- Course: CSC4006 - Research Project |
|
|
|
## Acknowledgments |
|
|
|
- Built using the Unsloth library for efficient training |
|
- Based on Meta's Llama 3.1 model |
|
- Developed as part of cybersecurity research at Queen's University Belfast |
|
|