DiffReaper-Talk

A 1.5B parameter Discrete Diffusion Language Model (dLLM) optimized for parallel token prediction. Trained during foundational pre-training phase on general text corpora.

Summary

DiffReaper-Talk uses a Transformer-based discrete diffusion architecture to predict multiple tokens in parallel. This approach avoids the sequential bottleneck of standard autoregressive generation.

Technical Details

  • Architecture: 24-Layer Transformer Encoder
  • Embedding Dim: 2048
  • Heads: 16
  • Parameters: ~1.5 Billion
  • Hardware: 1x NVIDIA A100 (80GB VRAM)
  • Objective: Markovian Discrete Denoising (Continuous Embedding Space)
  • Precision: Mixed BF16
  • Context Window: 1024 Tokens

Current Status

Phase 2 (Logic) Complete. Logic and domain-specific training (Code) to be applied post-convergence.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for darwinkernelpanic/DiffReaper-Talk

Finetuned
(1)
this model
Finetunes
1 model