Re:Form: Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs

This repository contains a model from the Re:Form framework, presented in the paper Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny.

Re:Form is a novel approach that addresses the challenges of unreliable and unscalable verification processes in Large Language Models (LLMs) that typically operate with informal languages. It proposes grounding LLMs in rigorous formal systems, specifically using the Dafny formal language, to enable automatic and mathematically provable verification of their reasoning processes and outcomes. This capability is pivotal for achieving large-scale, reliable formal software verification.

Overview

The Re:Form pipeline focuses on reducing human priors in complex programming tasks through an automatic and scalable data curation pipeline. It integrates careful Reinforcement Learning (RL) designs with feedback from a formal language verifier. This methodology allows even small models (e.g., 0.5B) to generate syntactically valid and verifiable Dafny code, surpassing proprietary models and demonstrating stronger generalization to out-of-domain tasks, including on the challenging DafnyComp benchmark.

Overall Pipeline

Usage

You can load and use the Re:Form models with the Hugging Face transformers library. The model uses the Qwen2ForCausalLM architecture.

Here's an example of how to perform text generation for Dafny code:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Replace "Veri-Code/sft_7B" with the specific model checkpoint you want to load
# e.g., "Veri-Code/sft_0.5B", "Veri-Code/sft_1.5B", "Veri-Code/sft_3B", "Veri-Code/sft_14B", or "Veri-Code/14B-RL-entropy"
model_id = "Veri-Code/sft_7B" 

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16, # Adjust dtype based on your hardware (e.g., torch.float16 for GPUs without bfloat16 support)
    device_map="auto"
)

# Example: Generate Dafny code for a function that returns the absolute value
dafny_prompt = """function Abs(x: int): int
ensures Abs(x) >= 0
{
"""

inputs = tokenizer.apply_chat_template([{"role": "user", "content": dafny_prompt}], return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs.to(model.device), max_new_tokens=100)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(generated_text)

For more detailed usage, including training and evaluation scripts, please refer to the official GitHub repository.

Citation

If you use this work in your research, please cite:

@misc{yan2025reformreducinghuman,
      title={Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny}, 
      author={Chuanhao Yan and Fengdi Che and Xuhan Huang and Xu Xu and Xin Li and Yizhi Li and Xingwei Qu and Jingzhe Shi and Zhuangzhuang He and Chenghua Lin and Yaodong Yang and Binhang Yuan and Hang Zhao and Yu Qiao and Bowen Zhou and Jie Fu},
      year={2025},
      eprint={2507.16331},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.16331}, 
}
Downloads last month
25
Safetensors
Model size
14.8B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Veri-Code/ReForm-14B-RL-entropy

Quantizations
1 model

Collection including Veri-Code/ReForm-14B-RL-entropy