THU-KEG
/

LLaDA-8B-BGPO-sudoku

Reinforcement Learning

Model card Files Files and versions

linny2002 commited on Oct 11

Commit

b691968

·

verified ·

1 Parent(s): 1a9fe0f

Create README.md

Files changed (1) hide show

README.md +49 -0

README.md ADDED Viewed

	@@ -0,0 +1,49 @@

+---
+license: apache-2.0
+language:
+- en
+tags:
+- reinforcement-learning
+- sudoku
+- dllm
+- bgpo
+- llada
+size_categories:
+- 8B
+---
+# LLaDA-8B-BGPO-sudoku
+[![Paper](https://img.shields.io/badge/Paper-arXiv:-red)]()
+[![Code](https://img.shields.io/badge/Code-GitHub-blue)]()
+## Model Description
+**LLaDA-8B-BGPO-sudoku** is an 8-billion parameter diffusion large language model (dLLM) that was trained on LLaDA-8B-Instruct using Boundary-Guided Policy Optimization (BGPO) for enhanced Sudoku solving capabilities.
+## Model Details
+- **Model Type**: Diffusion Large Language Model (dLLM)
+- **Parameters**: 8 billion
+- **Training Method**: Boundary-Guided Policy Optimization (BGPO)
+- **Base Model**: LLaDA-8B-Instruct
+- **Task**: Sudoku
+- **Language**: English
+## Training Details
+- **Training Steps**: 400 steps
+- **Response Length**: 256 tokens
+- **Train Diffusion Steps**: 128
+- **Eval Diffusion Steps**: 256
+- **Block Size**: 32
+- **Monte Carlo Sample Size ($n_t$)**: 32
+- **Learning Rate**: 5e-7
+- **Batch Size**: 16
+- **Framework**: Built on VeRL (Volcengine Reinforcement Learning)
+## Usage & Limitations
+- Primarily designed for Sudoku tasks.
+- Performance may vary on other tasks.
+- Requires appropriate computational resources for inference.