ChessFormer-RL
ChessFormer-RL represents an experimental checkpoint in training chess models with reinforcement learning. Note: This model is actually the 8th supervised learning checkpoint (49152 steps) that was intended as initialization for RL training, as the full RL training encountered challenges.
Model Description
- Model type: Transformer for chess (RL training initialization)
- Language(s): Chess (FEN notation)
- License: MIT
- Parameters: 100.7M
Important Notice
β οΈ This model represents a research checkpoint rather than a completed RL-trained model. The actual reinforcement learning training encountered:
- Gradient norm explosion
- Noisy reward signals
- Performance degradation from this initialization point
This checkpoint is provided for researchers interested in:
- RL training initialization strategies
- Comparative analysis with the final SL model
- Continuing RL experiments with improved methods
Architecture
Identical to ChessFormer-SL:
- Blocks: 20 transformer layers
- Hidden size: 640
- Attention heads: 8
- Intermediate size: 1728
- Features: RMSNorm, SwiGLU activation, custom FEN tokenizer
Training Details
Phase 1: Supervised Learning (This Checkpoint)
- Dataset: kaupane/lichess-2023-01-stockfish-annotated(depth18 split)
- Training: 49152 steps of supervised learning on Stockfish evaluations
- Purpose: Initialization for subsequent RL training
Phase 2: Reinforcement Learning (Attempted)
- Method: Self-play with Proximal Policy Optimization (PPO)
- Environment: Batch chess environment with sparse terminal rewards
- Outcome: Training instabilities led to performance degradation
- Current Status: Requires further research and improved RL methodology
Training Metrics (This Checkpoint)
- Action Loss: 1.8329
- Value Loss: 0.0501
- Invalid Loss: 0.0484
Performance
As an intermediate SL checkpoint, this model exhibits:
- Similar capabilities to early ChessFormer-SL training
- Less refined than the final SL model
- Suitable for RL initialization experiments
Comparison with ChessFormer-SL
| Metric | ChessFormer-RL (8th ckpt) | ChessFormer-SL (20th ckpt) | 
|---|---|---|
| Action Loss | 1.8329 | 1.6985 | 
| Value Loss | 0.0501 | 0.0407 | 
| Invalid Loss | 0.0484 | 0.0303 | 
Research Context
RL Training Challenges Encountered
- Gradient Instability: Explosive gradient norms during PPO updates
- Sparse Rewards: Terminal-only rewards created noisy learning signals
- Action Space Complexity: 1,969 possible moves created exploration challenges
- Self-Play Dynamics: Unstable opponent strength during training
Usage
Installation
pip install torch transformers huggingface_hub chess
# Download model.py from this repository
Loading the Model
import torch
from model import ChessFormerModel
# Load model
model = ChessFormerModel.from_pretrained("kaupane/ChessFormer-RL")
model.eval()
# This is an intermediate checkpoint - performance will be lower than ChessFormer-SL
For RL Research
# This checkpoint can serve as initialization for RL experiments
from train_rl import RLTrainer
# Load checkpoint for RL training continuation
trainer = RLTrainer(
    model=model,
    # ... other hyperparameters
)
trainer.resume("path/to/checkpoint", from_sl_checkpoint=True)
Limitations
Technical Limitations
- Incomplete Training: Represents intermediate rather than final model
- RL Instabilities: Subsequent RL training was unsuccessful
- Performance: Lower quality than ChessFormer-SL final checkpoint
Research Limitations
- Demonstrates challenges rather than solutions for chess RL
- Requires significant additional work for competitive performance
- Not suitable for production use
Intended Use
This model is specifically intended for:
- β RL research and experimentation
- β Studying initialization strategies for chess RL
- β Comparative analysis of SL vs RL training trajectories
- β Educational purposes in understanding RL challenges
Not intended for:
- β Practical chess playing applications
- β Production chess engines
- β Competitive chess analysis
Additional Information
- Repository: GitHub link
- Demo: HuggingFace Space Demo
- Related: ChessFormer-SL (Completed SL Training)
This model represents ongoing research into chess RL training. While the full RL training was unsuccessful, this checkpoint may be an initial starting point for future research directions.
- Downloads last month
- 4
