linny2002 commited on
Commit
b691968
·
verified ·
1 Parent(s): 1a9fe0f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -0
README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - reinforcement-learning
7
+ - sudoku
8
+ - dllm
9
+ - bgpo
10
+ - llada
11
+ size_categories:
12
+ - 8B
13
+ ---
14
+
15
+ # LLaDA-8B-BGPO-sudoku
16
+
17
+ [![Paper](https://img.shields.io/badge/Paper-arXiv:-red)]()
18
+ [![Code](https://img.shields.io/badge/Code-GitHub-blue)]()
19
+
20
+ ## Model Description
21
+
22
+ **LLaDA-8B-BGPO-sudoku** is an 8-billion parameter diffusion large language model (dLLM) that was trained on LLaDA-8B-Instruct using Boundary-Guided Policy Optimization (BGPO) for enhanced Sudoku solving capabilities.
23
+
24
+ ## Model Details
25
+
26
+ - **Model Type**: Diffusion Large Language Model (dLLM)
27
+ - **Parameters**: 8 billion
28
+ - **Training Method**: Boundary-Guided Policy Optimization (BGPO)
29
+ - **Base Model**: LLaDA-8B-Instruct
30
+ - **Task**: Sudoku
31
+ - **Language**: English
32
+
33
+ ## Training Details
34
+
35
+ - **Training Steps**: 400 steps
36
+ - **Response Length**: 256 tokens
37
+ - **Train Diffusion Steps**: 128
38
+ - **Eval Diffusion Steps**: 256
39
+ - **Block Size**: 32
40
+ - **Monte Carlo Sample Size ($n_t$)**: 32
41
+ - **Learning Rate**: 5e-7
42
+ - **Batch Size**: 16
43
+ - **Framework**: Built on VeRL (Volcengine Reinforcement Learning)
44
+
45
+ ## Usage & Limitations
46
+
47
+ - Primarily designed for Sudoku tasks.
48
+ - Performance may vary on other tasks.
49
+ - Requires appropriate computational resources for inference.