LLaDA-8B-BGPO-sudoku / README.md

linny2002

Update README.md

101ce15 verified 21 days ago

preview code

raw

history blame contribute delete

1.33 kB

metadata

license: apache-2.0
language:
  - en
tags:
  - reinforcement-learning
  - sudoku
  - dllm
  - bgpo
  - llada
size_categories:
  - 8B

LLaDA-8B-BGPO-sudoku

Model Description

LLaDA-8B-BGPO-sudoku is an 8-billion parameter diffusion large language model (dLLM) that was trained on LLaDA-8B-Instruct using Boundary-Guided Policy Optimization (BGPO) for enhanced Sudoku solving capabilities.

Model Details

Model Type: Diffusion Large Language Model (dLLM)
Parameters: 8 billion
Training Method: Boundary-Guided Policy Optimization (BGPO)
Base Model: LLaDA-8B-Instruct
Task: Sudoku
Language: English

Training Details

Training Steps: 400 steps
Response Length: 256 tokens
Train Diffusion Steps: 128
Eval Diffusion Steps: 256
Block Size: 32
Monte Carlo Sample Size ($n_t$): 32
Learning Rate: 5e-7
Batch Size: 16
Framework: Built on VeRL (Volcengine Reinforcement Learning)

Usage & Limitations

Primarily designed for Sudoku tasks.
Performance may vary on other tasks.
Requires appropriate computational resources for inference.