Description

2 layer mamba2 models distilled from JunxiongWang/Llama3.2-Mamba2-3B-distill. Early stop at 48000 step.

Used in STree: Speculative Tree Decoding for Hybrid State-Space Models as a draft model for speculative decoding for hybrid models.

For more details on installation, training, and evaluation, please refer to the GitHub repository.

Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ycwu97/mamba2-distilled-small

Finetuned
(1)
this model