Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny
This repository contains the Qwen2.5-Coder-1.5B
model, a supervised fine-tuned (SFT) checkpoint developed as part of the Re:Form framework. This work was presented in the paper Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny.
Re:Form aims to revolutionize formal software verification by reducing the reliance on extensive human-annotated priors. It achieves this by grounding Large Language Models (LLMs) in rigorous formal systems, where generative models operate directly in formal language spaces (e.g., Dafny). This approach enables the automatic and mathematically provable verification of their reasoning processes and outcomes, addressing a critical challenge in achieving large-scale, reliable formal software verification.
The pipeline introduces an automatic and scalable data curation process and integrates careful Reinforcement Learning (RL) designs with feedback from a formal language verifier. The project also introduced DafnyComp, a benchmark of compositional formal programs with auto-formalized specifications. Even small models like this (0.5B in the paper) can generate syntactically valid and verifiable Dafny code, surpassing proprietary models, with RL further enhancing performance and generalization.
Project Resources
- Paper (Hugging Face): Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny
- Paper (arXiv): https://arxiv.org/abs/2507.16331
- Project Page: https://veri-code.github.io/ReForm-page
- GitHub Repository: https://github.com/veri-code/ReForm
How to use
You can load and use this model with the Hugging Face transformers
library. Please ensure you set trust_remote_code=True
when loading, as the model uses custom tokenization and model architecture.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# This repository contains the Qwen2.5-Coder-1.5B model.
# Other available checkpoints from the Veri-Code organization include:
# SFT models: sft_0.5B, sft_1.5B, sft_3B, sft_7B, sft_14B
# RL fine-tuned model: 14B-RL-entropy
model_id = "Veri-Code/ReForm-Qwen2.5-Coder-1.5B"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16, # Use torch.float16 if bfloat16 is not supported
device_map="auto",
trust_remote_code=True
)
# Example prompt: Translate Python code to Dafny
prompt = """```python
def factorial(n):
if n == 0:
return 1
else:
return n * factorial(n-1)
Translate the above Python code into Dafny, including pre- and post-conditions: """
Tokenize input
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
Generate response
output_ids = model.generate( input_ids, max_new_tokens=256, # Adjust max_new_tokens as needed do_sample=True, temperature=0.7, top_p=0.9 )
Decode and print the generated text
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True) print(generated_text)
## Citation
If you find this work helpful in your research, please consider citing the paper:
```bibtex
@misc{yan2025reformreducinghuman,
title={Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny},
author={Chuanhao Yan and Fengdi Che and Xuhan Huang and Xu Xu and Xin Li and Yizhi Li and Xingwei Qu and Jingzhe Shi and Zhuangzhuang He and Chenghua Lin and Yaodong Yang and Binhang Yuan and Hang Zhao and Yu Qiao and Bowen Zhou and Jie Fu},
year={2025},
eprint={2507.16331},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2507.16331},
}
- Downloads last month
- 9