StarCoder2-SFT / README.md

Improve model card: Add pipeline tag, license, GitHub link, abstract, and sample usage

50072b7 verified 7 days ago

3.82 kB

	---
	base_model: bigcode/starcoder2-7b
	library_name: peft
	pipeline_tag: text-generation
	license: other
	tags:
	- code-generation
	---

	# Model Card for StarCoder2-SFT

	This is the adapter of the Starcoder2 model trained using Supervised Fine-Tuning (SFT) on the DiSCo dataset, presented in the paper [Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences](https://huggingface.co/papers/2506.00419). This model aims to improve secure code generation in Large Language Models.

	To use this model for downstream tasks, you need to merge this adapter with its base model, "bigcode/starcoder2-7b", or load it directly on top of the base model using the `peft` library.

	Code Repository: https://github.com/StonyBrookNLP/disco-lpo

	## Abstract

	LLM generated code often contains security issues. We address two key challenges in improving secure code generation. First, obtaining high quality training data covering a broad set of security issues is critical. To address this, we introduce a method for distilling a preference dataset of insecure and secure code pairs from frontier LLMs, along with a security reasoning that explains the issues and the fix. The key idea here is to make use of security knowledge sources to devise a systematic prompting strategy that ensures broad coverage. Second, aligning models to secure code requires focusing on localized regions of code. Direct preference optimization methods, like SimPO, are not designed to handle these localized differences and turn out to be ineffective. We address this with a new localized preference optimization algorithm that masks the security related tokens in both the winning (secure) and losing (insecure) responses. To prevent loss in code quality, we also add a regularizer. Evaluations show that both training on our dataset, DiSCo, and the new preference optimization algorithm, LPO, yield substantial reductions in code insecurity while also improving overall code quality.

	## Usage

	To use this adapter for code generation, first load the base model, then load this PEFT adapter on top of it.

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel
	import torch

	base_model_id = "bigcode/starcoder2-7b"
	adapter_id = "StonyBrookNLP/StarCoder2-SFT" # This model adapter

	# Load the tokenizer for the base model
	tokenizer = AutoTokenizer.from_pretrained(base_model_id)

	# Load the base model
	model = AutoModelForCausalLM.from_pretrained(
	base_model_id,
	torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
	device_map="auto" # or specify your device, e.g., "cuda:0"
	)

	# Load the PEFT adapter on top of the base model
	model = PeftModel.from_pretrained(model, adapter_id)
	model.eval() # Set the model to evaluation mode

	# Example prompt for secure code generation
	prompt = "# Write a Python function that safely handles user input.
	def get_safe_input(prompt):
	"
	input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)

	# Generate text
	with torch.no_grad():
	generated_ids = model.generate(
	input_ids,
	max_new_tokens=100,
	do_sample=True,
	temperature=0.7,
	top_p=0.9,
	pad_token_id=tokenizer.eos_token_id # Important for generation
	)

	# Decode and print the generated text
	print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
	```

	## Citation

	Please include the following citation if you are using resources provided in this work:

	```bibtex
	@article{saqib2025teaching,
	title={Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences},
	author={Saqib, Mohammad and Chakraborty, Saikat and Karmaker, Santu and Balasubramanian, Niranjan},
	journal={arXiv preprint arXiv:2506.00419},
	year={2025}
	}
	```