Pretrained baselines for sequence modeling research.
-
puigde/gated-deltanet-360M-15B-slimpajama
Text Generation • 0.4B • Updated • 322 -
puigde/rwkv7-380M-15B-slimpajama
Text Generation • 0.4B • Updated • 425 -
puigde/modern-transformer-gqa-370M-15B-slimpajama
Text Generation • 0.4B • Updated -
puigde/modern-transformer-mha-370M-15B-slimpajama
Text Generation • 0.4B • Updated