metadata
			license: apache-2.0
language:
  - en
tags:
  - reinforcement-learning
  - sudoku
  - dllm
  - bgpo
  - llada
size_categories:
  - 8B
LLaDA-8B-BGPO-sudoku
Model Description
LLaDA-8B-BGPO-sudoku is an 8-billion parameter diffusion large language model (dLLM) that was trained on LLaDA-8B-Instruct using Boundary-Guided Policy Optimization (BGPO) for enhanced Sudoku solving capabilities.
Model Details
- Model Type: Diffusion Large Language Model (dLLM)
 - Parameters: 8 billion
 - Training Method: Boundary-Guided Policy Optimization (BGPO)
 - Base Model: LLaDA-8B-Instruct
 - Task: Sudoku
 - Language: English
 
Training Details
- Training Steps: 400 steps
 - Response Length: 256 tokens
 - Train Diffusion Steps: 128
 - Eval Diffusion Steps: 256
 - Block Size: 32
 - Monte Carlo Sample Size ($n_t$): 32
 - Learning Rate: 5e-7
 - Batch Size: 16
 - Framework: Built on VeRL (Volcengine Reinforcement Learning)
 
Usage & Limitations
- Primarily designed for Sudoku tasks.
 - Performance may vary on other tasks.
 - Requires appropriate computational resources for inference.