Text Generation
Transformers
Safetensors
qwen3
conversational
text-generation-inference
PIPer Mascot

PIPer: On-Device Environment Setup via Online Reinforcement Learning

Paper | Code

Models Dataset License

Democratizing environment setup with on-device sized models that match the performance of much larger proprietary systems

🎯 Overview

Environment setupβ€”the process of configuring systems to work with specific software projectsβ€”remains a persistent challenge in software engineering. PIPer addresses this by training specialized on-device models that can automatically generate correct Bash scripts for environment configuration.

Our approach combines:

  • πŸ“š Supervised Fine-Tuning (SFT) with executable scripts from larger models
  • 🎯 Reinforcement Learning with Verifiable Rewards (RLVR) using lightweight proxy LLM-reward

πŸ† Key Results

Model Size EnvBench avg@5 Cost per 1M tokens
PIPer 8B 19.4 $0.60
GPT-4o - 19.4 $15.00
Qwen3-32B 32B 16.2 $2.00
Qwen3-8B 8B 2.6 $0.60

πŸŽ‰ PIPer achieves 9Γ— improvement over its base model while matching GPT-4o performance at 25x lower cost

Performance vs Cost Analysis

πŸ“¦ Available Artifacts

πŸ€– Model Checkpoints

Model Description HuggingFace Link
πŸ… PIPer (Full) Complete SFT+RL trained model JetBrains-Research/PIPer-8B
🎯 PIPer (RL-only) RLVR checkpoint only JetBrains-Research/PIPer-8B-RL-only
πŸ“š PIPer (SFT-only) Supervised fine-tuning only JetBrains-Research/PIPer-8B-SFT-only

πŸ“Š Datasets

Dataset Description HuggingFace Link
EnvBench Zero-shot RL Training prompts and evaluation data JetBrains-Research/PIPer-envbench-zeroshot-rl
EnvBench SFT 2500 Zeroshot trajectories from Qwen-32B in ShareGPT format JetBrains-Research/PIPer-SFT-2500-sharegpt
PIPer Eval Full evaluation results for EnvBench and Repo2Run JetBrains-Research/PIPer-eval

πŸš€ Reproduce the results

We use uv for dependency management and Ray for distributed training.

git clone https://github.com/JetBrains-Research/PIPer.git
cd PIPer
git submodule update --init --recursive
uv sync

To run the experiments, you need a node with at least 4 H200 GPUs and Ray installed and running. Then you can run all the experiments with the following command:

uv run piper/hparams_entrypoint.py --multirun +experiment==llm-reward

You can look up the experiment Hydra configurations in piper/config/ folder, or print out the whole config with the following command:

uv run piper/hparams_entrypoint.py +experiment=llm-reward --info config

πŸ“Š Evaluation Benchmarks

Benchmark Description Metric Our Result
EnvBench-Python 329 Python repositories pass@5 πŸ† 27/329
Repo2Run 420 Python repositories pass@5 πŸ† 103/420
Terminal-Bench 80 terminal tasks pass@10 4/80

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

Downloads last month
41
Safetensors
Model size
8B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for JetBrains-Research/PIPer-8B

Finetuned
(3)
this model

Datasets used to train JetBrains-Research/PIPer-8B

Collection including JetBrains-Research/PIPer-8B