brainbug / README.md
Sagar123x's picture
Upload trained BrainBug CodeT5 model
3cc41e1 verified
metadata
language:
  - code
license: mit
tags:
  - code-generation
  - bug-fixing
  - code-repair
  - codet5
  - debugging
datasets:
  - custom
metrics:
  - accuracy
  - exact-match
library_name: transformers
pipeline_tag: text2text-generation

brainbug

Model Description

This is a fine-tuned CodeT5 model for automatic bug detection and code repair. The model has been trained to identify and fix various types of programming errors in Python code.

Supported Error Types

  • WVAV: Wrong Variable Used in Variable Assignment
  • MLAC: Missing Line After Call
  • WPFV: Wrong Parameter in Function/Method Call
  • And more...

Model Details

  • Base Model: Salesforce/codet5-base
  • Fine-tuned on: Custom bug-fix dataset
  • Task: Code-to-Code generation (bug fixing)
  • Language: Python
  • Model Size: 220M parameters

Usage

from transformers import T5ForConditionalGeneration, RobertaTokenizer

# Load model and tokenizer
model = T5ForConditionalGeneration.from_pretrained("Sagar123x/brainbug")
tokenizer = RobertaTokenizer.from_pretrained("Sagar123x/brainbug")

# Example: Fix buggy code
faulty_code = """
def check_for_file(self, file_path):
    files = self.connection.glob(file_path)
    return len(files) == 1
"""

# Prepare input
input_text = f"Fix WVAV: {faulty_code}"
inputs = tokenizer(input_text, return_tensors="pt", max_length=256, truncation=True)

# Generate fix
outputs = model.generate(**inputs, max_length=256, num_beams=5)
fixed_code = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(fixed_code)

Training Details

  • Training Epochs: 10
  • Batch Size: 1 (with gradient accumulation)
  • Learning Rate: 3e-5
  • Optimizer: AdamW
  • Hardware: NVIDIA RTX 4050 (6GB)

Performance Metrics

  • Exact Match Accuracy: 2.60%
  • Token-Level Accuracy: 28.52%
  • Average Similarity: 76.75%

Limitations

  • Trained primarily on Python code
  • Best performance on error types seen during training
  • May not handle very long code snippets (>256 tokens)
  • Requires error type specification for optimal results

Citation

@misc{brainbug-codet5,
  author = {Your Name},
  title = {BrainBug: CodeT5 for Automatic Bug Repair},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/Sagar123x/brainbug}}
}

License

MIT License

Contact

For questions or issues, please open an issue on the model repository.