Did anyone get speculative decode working?
#10 opened 3 days ago
by
amit864
Successfully Running Qwen3-Next-80B-A3B-Instruct-AWQ-4bit on 3x RTX 3090s
π€
2
1
#9 opened 13 days ago
by
8055izham

sorta works on vllm now
π
1
14
#8 opened 26 days ago
by
MrDragonFox

Recent update throws error: KeyError: 'layers.30.mlp.shared_expert.down_proj.weight'
3
#7 opened 28 days ago
by
itsmebcc
gibberish still persists?
5
#6 opened about 1 month ago
by
Geximus

MTP Accepted throughput always at 0.00 tokens/s
4
#5 opened about 1 month ago
by
bpozdena
Experiencing excessive response latency.
π
4
#4 opened about 1 month ago
by
JunHowie

Does this quantized version support running on machines like V100 and V100S?
β
1
#3 opened about 1 month ago
by
ShaoShuoHe

Error on inputting lots of prompts
#2 opened about 1 month ago
by
dwaynedu
Error when running in VLLM
π
2
18
#1 opened about 1 month ago
by
d8rt8v
