linny2002's picture
Update README.md
101ce15 verified
metadata
license: apache-2.0
language:
  - en
tags:
  - reinforcement-learning
  - sudoku
  - dllm
  - bgpo
  - llada
size_categories:
  - 8B

LLaDA-8B-BGPO-sudoku

Paper Code

Model Description

LLaDA-8B-BGPO-sudoku is an 8-billion parameter diffusion large language model (dLLM) that was trained on LLaDA-8B-Instruct using Boundary-Guided Policy Optimization (BGPO) for enhanced Sudoku solving capabilities.

Model Details

  • Model Type: Diffusion Large Language Model (dLLM)
  • Parameters: 8 billion
  • Training Method: Boundary-Guided Policy Optimization (BGPO)
  • Base Model: LLaDA-8B-Instruct
  • Task: Sudoku
  • Language: English

Training Details

  • Training Steps: 400 steps
  • Response Length: 256 tokens
  • Train Diffusion Steps: 128
  • Eval Diffusion Steps: 256
  • Block Size: 32
  • Monte Carlo Sample Size ($n_t$): 32
  • Learning Rate: 5e-7
  • Batch Size: 16
  • Framework: Built on VeRL (Volcengine Reinforcement Learning)

Usage & Limitations

  • Primarily designed for Sudoku tasks.
  • Performance may vary on other tasks.
  • Requires appropriate computational resources for inference.