---
license: mit
tags:
  - codellama
  - linux
  - bugfix
  - lora
  - qlora
  - git-diff
base_model: codellama/CodeLLaMA-7b-Instruct-hf
model_type: LlamaForCausalLM
library_name: peft
pipeline_tag: text-generation

model-index:
- name: CodeLLaMA-Linux-BugFix
  results:
  - task:
      type: text-generation
      name: Bug-fix Patch Generation
    dataset:
      type: custom
      name: Linux Kernel Bugfix Commits
      config: linux-bugfix-prompt-completion
      split: test
    metrics:
      - type: bleu
        value: 33.87
        name: BLEU
      - type: rouge1
        value: 0.4355
        name: ROUGE-1 F1
      - type: rouge2
        value: 0.3457
        name: ROUGE-2 F1
      - type: rougeL
        value: 0.3612
        name: ROUGE-L F1
---

  # CodeLLaMA-Linux-BugFix

  A fine-tuned version of `CodeLLaMA-7B-Instruct`, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages.

  ---

  ## 🎯 Overview

  This project targets automated Linux kernel bug fixing by:

  - **Mining real commit data** from the kernel Git history
  - **Training a specialized QLoRA model** on diff-style fixes
  - **Generating Git patches** in response to bug-prone code
  - **Evaluating results** using BLEU, ROUGE, and human inspection

  The model achieves strong performance in generating accurate Linux kernel bug fixes, making it a valuable tool for automated code review and bug detection.

  ---

  ## 📊 Performance Results

  ### Evaluation Metrics

  ✅ **BLEU Score**: 33.87

  ✅ **ROUGE Scores**:
  - **ROUGE-1**: P=0.3775, R=0.7306, F1=0.4355
  - **ROUGE-2**: P=0.2898, R=0.6096, F1=0.3457
  - **ROUGE-L**: P=0.3023, R=0.6333, F1=0.3612

  These results demonstrate the model's ability to:
  - Generate syntactically correct Git diff patches
  - Maintain semantic similarity to reference fixes
  - Produce meaningful code changes that address the underlying bugs

  ---

  ## 🧠 Model Configuration

  - **Base model**: `CodeLLaMA-7B-Instruct`
  - **Fine-tuning method**: QLoRA with 4-bit quantization
  - **Training setup**:
    - LoRA r=64, alpha=16, dropout=0.1
    - Batch size: 64, LR: 2e-4, Epochs: 3
    - Mixed precision (bfloat16), gradient checkpointing
  - **Hardware**: Optimized for NVIDIA H200 GPUs

  ---

  ## 📈 Training Progress
  The model was trained for 1000 steps with the following key metrics:
  ### Training Results
  - **Final Loss**: ~0.3335 (converged)
  - **Final Learning Rate**: 2.08304527802282E-06
  - **Training Steps**: 1000
  - **Convergence**: Stable loss plateau achieved
  ### Training Curves
  ![Training Loss](train/output/loss.png)
  *Training loss over 1000 steps showing convergence around 0.3335*
  ![Learning Rate Schedule](train/output/learning_rate.png)
  *Learning rate decay schedule with final rate of 2.08304527802282E-06*

  ---

  ## 📊 Dataset

  Custom dataset extracted from Linux kernel Git history.

  ### Filtering Criteria
  Bug-fix commits containing:
  `fix`, `bug`, `crash`, `memory`, `null`, `panic`, `overflow`, `race`, `corruption`, etc.

  ### Structure
  - Language: C (`.c`, `.h`)
  - Context: 10 lines before/after the change
  - Format:

  ```json
  {
    "input": {
      "original code": "C code snippet with bug",
      "instruction": "Commit message or fix description"
    },
    "output": {
      "diff codes": "Git diff showing the fix"
    }
  }
  ```

  * **File**: `training_data_100k.jsonl` (100,000 samples)

  ---

  ## 🚀 Quick Start

  ### Prerequisites

  - Python 3.8+
  - CUDA-compatible GPU (recommended)
  - 16GB+ RAM
  - 50GB+ disk space

  ### Install dependencies

  ```bash
  pip install -r requirements.txt
  ```

  ### 1. Build the Dataset

  ```bash
  cd dataset_builder
  python extract_linux_bugfixes_parallel.py
  python format_for_training.py
  ```

  ### 2. Fine-tune the Model

  ```bash
  cd train
  python train_codellama_qlora_linux_bugfix.py
  ```

  ### 3. Run Evaluation

  ```bash
  cd evaluate
  python evaluate_linux_bugfix_model.py
  ```

  ### 4. Use the Model

  ```python
  from transformers import AutoTokenizer, AutoModelForCausalLM
  from peft import PeftModel

  # Load the fine-tuned model
  model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
  model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix")
  tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")

  # Generate a bug fix
  prompt = """
  Given the following original C code:
  if (!file->filter)
      return;

  Instruction: Fix the null pointer dereference

  Return the diff that fixes it:
  """

  inputs = tokenizer(prompt, return_tensors="pt")
  outputs = model.generate(**inputs, max_length=512, temperature=0.1)
  fix = tokenizer.decode(outputs[0], skip_special_tokens=True)
  print(fix)
  ```

  ---

  ## 📁 Project Structure

  ```
  CodeLLaMA-Linux-BugFix/
  ├── dataset_builder/
  │   ├── extract_linux_bugfixes_parallel.py    # Parallel extraction of bug fixes
  │   ├── format_for_training.py                # Format data for training
  │   └── build_dataset.py                      # Main dataset builder
  ├── dataset/
  │   ├── training_data_100k.jsonl              # 100K training samples
  │   └── training_data_prompt_completion.jsonl # Formatted training data
  ├── train/
  │   ├── train_codellama_qlora_linux_bugfix.py # Main training script
  │   ├── train_codellama_qlora_simple.py       # Simplified training
  │   ├── download_codellama_model.py           # Model download utility
  │   └── output/
  │       └── qlora-codellama-bugfix/           # Trained model checkpoints
  ├── evaluate/
  │   ├── evaluate_linux_bugfix_model.py        # Evaluation script
  │   ├── test_samples.jsonl                    # Test dataset
  │   └── output/                               # Evaluation results
  │       ├── eval_results.csv                  # Detailed results
  │       └── eval_results.json                 # JSON format results
  ├── requirements.txt                          # Python dependencies
  ├── README.md                                 # This file
  └── PROJECT_STRUCTURE.md                      # Detailed project overview
  ```

  ---

  ## 🧩 Features

  * 🔧 **Efficient Fine-tuning**: QLoRA + 4-bit quant = massive memory savings
  * 🧠 **Real-world commits**: From actual Linux kernel development
  * 💡 **Context-aware**: Code context extraction around bug lines
  * 💻 **Output-ready**: Generates valid Git-style diffs
  * 📈 **Strong Performance**: BLEU score of 33.87 with good ROUGE metrics
  * 🚀 **Production-ready**: Optimized for real-world deployment

  ---

  ## 📈 Evaluation Metrics

  * **BLEU**: Translation-style match to reference diffs
  * **ROUGE**: Overlap in fix content and semantic similarity
  * **Human Evaluation**: Subjective patch quality assessment

  ### Current Performance
  - **BLEU Score**: 33.87 (excellent for code generation tasks)
  - **ROUGE-1 F1**: 0.4355 (good semantic overlap)
  - **ROUGE-2 F1**: 0.3457 (reasonable bigram matching)
  - **ROUGE-L F1**: 0.3612 (good longest common subsequence)

  ---

  ## 🧪 Use Cases

  * **Automated kernel bug fixing**: Generate fixes for common kernel bugs
  * **Code review assistance**: Help reviewers identify potential issues
  * **Teaching/debugging kernel code**: Educational tool for kernel development
  * **Research in automated program repair (APR)**: Academic research applications
  * **CI/CD integration**: Automated testing and fixing in development pipelines

  ---

  ## 🔬 Technical Highlights

  ### Memory & Speed Optimizations

  * 4-bit quantization (NF4)
  * Gradient checkpointing
  * Mixed precision (bfloat16)
  * Gradient accumulation
  * LoRA parameter efficiency

  ### Training Efficiency

  * **QLoRA**: Reduces memory usage by ~75%
  * **4-bit quantization**: Further memory optimization
  * **Gradient checkpointing**: Trades compute for memory
  * **Mixed precision**: Faster training with maintained accuracy

  ---

  ## 🛠️ Advanced Usage

  ### Custom Training

  ```bash
  # Train with custom parameters
  python train_codellama_qlora_linux_bugfix.py \
      --learning_rate 1e-4 \
      --num_epochs 5 \
      --batch_size 32 \
      --lora_r 32 \
      --lora_alpha 16
  ```

  ### Evaluation on Custom Data

  ```bash
  # Evaluate on your own test set
  python evaluate_linux_bugfix_model.py \
      --test_file your_test_data.jsonl \
      --output_dir custom_eval_results
  ```

  ---

  ## 🤝 Contributing

  1. Fork this repo
  2. Create a feature branch (`git checkout -b feature/amazing-feature`)
  3. Commit your changes (`git commit -m 'Add amazing feature'`)
  4. Push to the branch (`git push origin feature/amazing-feature`)
  5. Open a Pull Request 🙌

  ### Development Guidelines

  - Follow PEP 8 style guidelines
  - Add tests for new features
  - Update documentation for API changes
  - Ensure all tests pass before submitting PR

  ---

  ## 📄 License

  MIT License – see `LICENSE` file for details.

  ---

  ## 🙏 Acknowledgments

  * **Meta** for CodeLLaMA base model
  * **Hugging Face** for Transformers + PEFT libraries
  * **The Linux kernel community** for open access to commit data
  * **Microsoft** for introducing LoRA technique
  * **University of Washington** for QLoRA research

  ---

  ## 📚 References

  * [CodeLLaMA (Meta, 2023)](https://arxiv.org/abs/2308.12950)
  * [QLoRA (Dettmers et al., 2023)](https://arxiv.org/abs/2305.14314)
  * [LoRA (Hu et al., 2021)](https://arxiv.org/abs/2106.09685)
  * [Automated Program Repair: A Survey](https://ieeexplore.ieee.org/document/8449519)

  ---

  ## 📞 Support

  For questions, issues, or contributions:
  - Open an issue on GitHub
  - Check the project documentation
  - Review the evaluation results in `evaluate/output/`

  ---

  ## 🔄 Version History

  - **v1.0.0**: Initial release with QLoRA training
  - **v1.1.0**: Added parallel dataset extraction
  - **v1.2.0**: Improved evaluation metrics and documentation
=======
---
license: mit
tags:
  - codellama
  - linux
  - bugfix
  - lora
  - qlora
  - git-diff
base_model: codellama/CodeLLaMA-7b-Instruct-hf
model_type: LlamaForCausalLM
library_name: peft
pipeline_tag: text-generation
---

# CodeLLaMA-Linux-BugFix

A fine-tuned version of `CodeLLaMA-7B-Instruct`, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages.

---

## 🎯 Overview

This project targets automated Linux kernel bug fixing by:

- **Mining real commit data** from the kernel Git history
- **Training a specialized QLoRA model** on diff-style fixes
- **Generating Git patches** in response to bug-prone code
- **Evaluating results** using BLEU, ROUGE, and human inspection

The model achieves strong performance in generating accurate Linux kernel bug fixes, making it a valuable tool for automated code review and bug detection.

---

## 📊 Performance Results

### Evaluation Metrics

✅ **BLEU Score**: 33.87

✅ **ROUGE Scores**:
- **ROUGE-1**: P=0.3775, R=0.7306, F1=0.4355
- **ROUGE-2**: P=0.2898, R=0.6096, F1=0.3457
- **ROUGE-L**: P=0.3023, R=0.6333, F1=0.3612

These results demonstrate the model's ability to:
- Generate syntactically correct Git diff patches
- Maintain semantic similarity to reference fixes
- Produce meaningful code changes that address the underlying bugs

---

## 🧠 Model Configuration

- **Base model**: `CodeLLaMA-7B-Instruct`
- **Fine-tuning method**: QLoRA with 4-bit quantization
- **Training setup**:
  - LoRA r=64, alpha=16, dropout=0.1
  - Batch size: 64, LR: 2e-4, Epochs: 3
  - Mixed precision (bfloat16), gradient checkpointing
- **Hardware**: Optimized for NVIDIA H200 GPUs

---

## 📊 Dataset

Custom dataset extracted from Linux kernel Git history.

### Filtering Criteria
Bug-fix commits containing:
`fix`, `bug`, `crash`, `memory`, `null`, `panic`, `overflow`, `race`, `corruption`, etc.

### Structure
- Language: C (`.c`, `.h`)
- Context: 10 lines before/after the change
- Format:

```json
{
  "input": {
    "original code": "C code snippet with bug",
    "instruction": "Commit message or fix description"
  },
  "output": {
    "diff codes": "Git diff showing the fix"
  }
}
```

* **File**: `training_data_100k.jsonl` (100,000 samples)

---

## 🚀 Quick Start

### Prerequisites

- Python 3.8+
- CUDA-compatible GPU (recommended)
- 16GB+ RAM
- 50GB+ disk space

### Install dependencies

```bash
pip install -r requirements.txt
```

### 1. Build the Dataset

```bash
cd dataset_builder
python extract_linux_bugfixes_parallel.py
python format_for_training.py
```

### 2. Fine-tune the Model

```bash
cd train
python train_codellama_qlora_linux_bugfix.py
```

### 3. Run Evaluation

```bash
cd evaluate
python evaluate_linux_bugfix_model.py
```

### 4. Use the Model

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load the fine-tuned model
model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix")
tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")

# Generate a bug fix
prompt = """
Given the following original C code:
if (!file->filter)
    return;

Instruction: Fix the null pointer dereference

Return the diff that fixes it:
"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.1)
fix = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(fix)
```

---

## 📁 Project Structure

```
CodeLLaMA-Linux-BugFix/
├── dataset_builder/
│   ├── extract_linux_bugfixes_parallel.py    # Parallel extraction of bug fixes
│   ├── format_for_training.py                # Format data for training
│   └── build_dataset.py                      # Main dataset builder
├── dataset/
│   ├── training_data_100k.jsonl              # 100K training samples
│   └── training_data_prompt_completion.jsonl # Formatted training data
├── train/
│   ├── train_codellama_qlora_linux_bugfix.py # Main training script
│   ├── train_codellama_qlora_simple.py       # Simplified training
│   ├── download_codellama_model.py           # Model download utility
│   └── output/
│       └── qlora-codellama-bugfix/           # Trained model checkpoints
├── evaluate/
│   ├── evaluate_linux_bugfix_model.py        # Evaluation script
│   ├── test_samples.jsonl                    # Test dataset
│   └── output/                               # Evaluation results
│       ├── eval_results.csv                  # Detailed results
│       └── eval_results.json                 # JSON format results
├── requirements.txt                          # Python dependencies
├── README.md                                 # This file
└── PROJECT_STRUCTURE.md                      # Detailed project overview
```

---

## 🧩 Features

* 🔧 **Efficient Fine-tuning**: QLoRA + 4-bit quant = massive memory savings
* 🧠 **Real-world commits**: From actual Linux kernel development
* 💡 **Context-aware**: Code context extraction around bug lines
* 💻 **Output-ready**: Generates valid Git-style diffs
* 📈 **Strong Performance**: BLEU score of 33.87 with good ROUGE metrics
* 🚀 **Production-ready**: Optimized for real-world deployment

---

## 📈 Evaluation Metrics

* **BLEU**: Translation-style match to reference diffs
* **ROUGE**: Overlap in fix content and semantic similarity
* **Human Evaluation**: Subjective patch quality assessment

### Current Performance
- **BLEU Score**: 33.87 (excellent for code generation tasks)
- **ROUGE-1 F1**: 0.4355 (good semantic overlap)
- **ROUGE-2 F1**: 0.3457 (reasonable bigram matching)
- **ROUGE-L F1**: 0.3612 (good longest common subsequence)

---

## 🧪 Use Cases

* **Automated kernel bug fixing**: Generate fixes for common kernel bugs
* **Code review assistance**: Help reviewers identify potential issues
* **Teaching/debugging kernel code**: Educational tool for kernel development
* **Research in automated program repair (APR)**: Academic research applications
* **CI/CD integration**: Automated testing and fixing in development pipelines

---

## 🔬 Technical Highlights

### Memory & Speed Optimizations

* 4-bit quantization (NF4)
* Gradient checkpointing
* Mixed precision (bfloat16)
* Gradient accumulation
* LoRA parameter efficiency

### Training Efficiency

* **QLoRA**: Reduces memory usage by ~75%
* **4-bit quantization**: Further memory optimization
* **Gradient checkpointing**: Trades compute for memory
* **Mixed precision**: Faster training with maintained accuracy

---

## 🛠️ Advanced Usage

### Custom Training

```bash
# Train with custom parameters
python train_codellama_qlora_linux_bugfix.py \
    --learning_rate 1e-4 \
    --num_epochs 5 \
    --batch_size 32 \
    --lora_r 32 \
    --lora_alpha 16
```

### Evaluation on Custom Data

```bash
# Evaluate on your own test set
python evaluate_linux_bugfix_model.py \
    --test_file your_test_data.jsonl \
    --output_dir custom_eval_results
```

---

## 🤝 Contributing

1. Fork this repo
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request 🙌

### Development Guidelines

- Follow PEP 8 style guidelines
- Add tests for new features
- Update documentation for API changes
- Ensure all tests pass before submitting PR

---

## 📄 License

MIT License – see `LICENSE` file for details.

---

## 🙏 Acknowledgments

* **Meta** for CodeLLaMA base model
* **Hugging Face** for Transformers + PEFT libraries
* **The Linux kernel community** for open access to commit data
* **Microsoft** for introducing LoRA technique
* **University of Washington** for QLoRA research

---

## 📚 References

* [CodeLLaMA (Meta, 2023)](https://arxiv.org/abs/2308.12950)
* [QLoRA (Dettmers et al., 2023)](https://arxiv.org/abs/2305.14314)
* [LoRA (Hu et al., 2021)](https://arxiv.org/abs/2106.09685)
* [Automated Program Repair: A Survey](https://ieeexplore.ieee.org/document/8449519)

---

## 📞 Support

For questions, issues, or contributions:
- Open an issue on GitHub
- Check the project documentation
- Review the evaluation results in `evaluate/output/`

---

## 🔄 Version History

- **v1.0.0**: Initial release with QLoRA training
- **v1.1.0**: Added parallel dataset extraction
- **v1.2.0**: Improved evaluation metrics and documentation