Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny
This repository contains models from the Re:Form framework, as presented in the paper Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny.
Re:Form introduces a novel approach to formal software verification by integrating Reinforcement Learning (RL) with Large Language Models (LLMs), focusing on formal languages like Dafny. This work aims to reduce the dependency on extensive human-annotated priors, a significant challenge in informal language-based LLMs. By grounding LLMs in rigorous formal systems, Re:Form enables automatic and mathematically provable verification of reasoning processes and outcomes, addressing scalability and reliability issues in generating verifiable programs.
The pipeline for Re:Form involves an automatic and scalable data curation process and carefully designed RL strategies that incorporate feedback from a formal language verifier. The paper also introduces DafnyComp, a benchmark of compositional formal programs. Even small models (e.g., 0.5B) fine-tuned with Supervised Fine-Tuning (SFT) can generate syntactically valid and verifiable Dafny code, surpassing proprietary models. Further improvements in performance and generalization to out-of-domain tasks are achieved through RL with regularization.
- Paper: Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny
- Project Page: https://veri-code.github.io/ReForm-page
- Code: https://github.com/Veri-Code/Veri-Code
Usage
This model is compatible with the Hugging Face transformers
library. You can load and use it for text generation tasks, particularly for generating Dafny code.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Replace with the specific model checkpoint you want to use, e.g., "Veri-Code/sft_0.5B" or "Veri-Code/reform-qwen2-7b-sft"
model_name = "Veri-Code/sft_0.5B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16, # Use bfloat16 for better performance if supported, otherwise torch.float16
device_map="auto" # Automatically loads the model across available devices (e.g., multiple GPUs)
)
model.eval() # Set model to evaluation mode
# Example: Generate Dafny code for a simple function
prompt = """
method Add(x: int, y: int) returns (sum: int)
ensures sum == x + y
{
"""
print(f"Input Prompt:
{prompt}")
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generate continuation of the code
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=100, # Generate up to 100 new tokens
do_sample=True,
temperature=0.7, # Adjust for more/less creative outputs
top_p=0.9 # Adjust for nucleus sampling
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Generated Dafny Code:
{generated_text}")
Citation
If you use this work in your research, please cite:
@misc{yan2025reformreducinghuman,
title={Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny},
author={Chuanhao Yan and Fengdi Che and Xuhan Huang and Xu Xu and Xin Li and Yizhi Li and Xingwei Qu and Jingzhe Shi and Zhuangzhuang He and Chenghua Lin and Yaodong Yang and Binhang Yuan and Hang Zhao and Yu Qiao and Bowen Zhou and Jie Fu},
year={2025},
eprint={2507.16331},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2507.16331},
}
- Downloads last month
- 10