DiffReaper-Talk
A 1.5B parameter Discrete Diffusion Language Model (dLLM) optimized for parallel token prediction. Trained during foundational pre-training phase on general text corpora.
Summary
DiffReaper-Talk uses a Transformer-based discrete diffusion architecture to predict multiple tokens in parallel. This approach avoids the sequential bottleneck of standard autoregressive generation.
Technical Details
- Architecture: 24-Layer Transformer Encoder
- Embedding Dim: 2048
- Heads: 16
- Parameters: ~1.5 Billion
- Hardware: 1x NVIDIA A100 (80GB VRAM)
- Objective: Markovian Discrete Denoising (Continuous Embedding Space)
- Precision: Mixed BF16
- Context Window: 1024 Tokens
Current Status
Phase 2 (Logic) Complete. Logic and domain-specific training (Code) to be applied post-convergence.
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support