PEFT
Safetensors
StarCoder2-SFT / README.md
nielsr's picture
nielsr HF Staff
Improve model card: Add pipeline tag, license, GitHub link, abstract, and sample usage
50072b7 verified
|
raw
history blame
3.82 kB
---
base_model: bigcode/starcoder2-7b
library_name: peft
pipeline_tag: text-generation
license: other
tags:
- code-generation
---
# Model Card for StarCoder2-SFT
This is the adapter of the Starcoder2 model trained using Supervised Fine-Tuning (SFT) on the DiSCo dataset, presented in the paper [Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences](https://huggingface.co/papers/2506.00419). This model aims to improve secure code generation in Large Language Models.
To use this model for downstream tasks, you need to merge this adapter with its base model, "bigcode/starcoder2-7b", or load it directly on top of the base model using the `peft` library.
**Code Repository**: https://github.com/StonyBrookNLP/disco-lpo
## Abstract
LLM generated code often contains security issues. We address two key challenges in improving secure code generation. First, obtaining high quality training data covering a broad set of security issues is critical. To address this, we introduce a method for distilling a preference dataset of insecure and secure code pairs from frontier LLMs, along with a security reasoning that explains the issues and the fix. The key idea here is to make use of security knowledge sources to devise a systematic prompting strategy that ensures broad coverage. Second, aligning models to secure code requires focusing on localized regions of code. Direct preference optimization methods, like SimPO, are not designed to handle these localized differences and turn out to be ineffective. We address this with a new localized preference optimization algorithm that masks the security related tokens in both the winning (secure) and losing (insecure) responses. To prevent loss in code quality, we also add a regularizer. Evaluations show that both training on our dataset, DiSCo, and the new preference optimization algorithm, LPO, yield substantial reductions in code insecurity while also improving overall code quality.
## Usage
To use this adapter for code generation, first load the base model, then load this PEFT adapter on top of it.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base_model_id = "bigcode/starcoder2-7b"
adapter_id = "StonyBrookNLP/StarCoder2-SFT" # This model adapter
# Load the tokenizer for the base model
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
# Load the base model
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
device_map="auto" # or specify your device, e.g., "cuda:0"
)
# Load the PEFT adapter on top of the base model
model = PeftModel.from_pretrained(model, adapter_id)
model.eval() # Set the model to evaluation mode
# Example prompt for secure code generation
prompt = "# Write a Python function that safely handles user input.
def get_safe_input(prompt):
"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
# Generate text
with torch.no_grad():
generated_ids = model.generate(
input_ids,
max_new_tokens=100,
do_sample=True,
temperature=0.7,
top_p=0.9,
pad_token_id=tokenizer.eos_token_id # Important for generation
)
# Decode and print the generated text
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
```
## Citation
Please include the following citation if you are using resources provided in this work:
```bibtex
@article{saqib2025teaching,
title={Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences},
author={Saqib, Mohammad and Chakraborty, Saikat and Karmaker, Santu and Balasubramanian, Niranjan},
journal={arXiv preprint arXiv:2506.00419},
year={2025}
}
```