Description
2 layer mamba2 models distilled from JunxiongWang/Llama3.2-Mamba2-3B-distill. Early stop at 48000 step.
Used in STree: Speculative Tree Decoding for Hybrid State-Space Models as a draft model for speculative decoding for hybrid models.
For more details on installation, training, and evaluation, please refer to the GitHub repository.
- Downloads last month
- 20
Model tree for ycwu97/mamba2-distilled-small
Base model
JunxiongWang/Llama3.2-Mamba2-3B-distill