Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny

This repository contains the Qwen2.5-Coder-1.5B model, a supervised fine-tuned (SFT) checkpoint developed as part of the Re:Form framework. This work was presented in the paper Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny.

Re:Form aims to revolutionize formal software verification by reducing the reliance on extensive human-annotated priors. It achieves this by grounding Large Language Models (LLMs) in rigorous formal systems, where generative models operate directly in formal language spaces (e.g., Dafny). This approach enables the automatic and mathematically provable verification of their reasoning processes and outcomes, addressing a critical challenge in achieving large-scale, reliable formal software verification.

The pipeline introduces an automatic and scalable data curation process and integrates careful Reinforcement Learning (RL) designs with feedback from a formal language verifier. The project also introduced DafnyComp, a benchmark of compositional formal programs with auto-formalized specifications. Even small models like this (0.5B in the paper) can generate syntactically valid and verifiable Dafny code, surpassing proprietary models, with RL further enhancing performance and generalization.

Project Resources

Overall Pipeline

How to use

You can load and use this model with the Hugging Face transformers library. Please ensure you set trust_remote_code=True when loading, as the model uses custom tokenization and model architecture.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# This repository contains the Qwen2.5-Coder-1.5B model.
# Other available checkpoints from the Veri-Code organization include:
# SFT models: sft_0.5B, sft_1.5B, sft_3B, sft_7B, sft_14B
# RL fine-tuned model: 14B-RL-entropy
model_id = "Veri-Code/ReForm-Qwen2.5-Coder-1.5B" 

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16, # Use torch.float16 if bfloat16 is not supported
    device_map="auto",
    trust_remote_code=True
)

# Example prompt: Translate Python code to Dafny
prompt = """```python
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

Translate the above Python code into Dafny, including pre- and post-conditions: """

Tokenize input

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)

Generate response

output_ids = model.generate( input_ids, max_new_tokens=256, # Adjust max_new_tokens as needed do_sample=True, temperature=0.7, top_p=0.9 )

Decode and print the generated text

generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True) print(generated_text)


## Citation

If you find this work helpful in your research, please consider citing the paper:

```bibtex
@misc{yan2025reformreducinghuman,
      title={Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny}, 
      author={Chuanhao Yan and Fengdi Che and Xuhan Huang and Xu Xu and Xin Li and Yizhi Li and Xingwei Qu and Jingzhe Shi and Zhuangzhuang He and Chenghua Lin and Yaodong Yang and Binhang Yuan and Hang Zhao and Yu Qiao and Bowen Zhou and Jie Fu},
      year={2025},
      eprint={2507.16331},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.16331}, 
}
Downloads last month
9
Safetensors
Model size
1.78B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Veri-Code/ReForm-SFT-1.5B

Quantizations
3 models

Collection including Veri-Code/ReForm-SFT-1.5B