Bloody hell!! running perfectly on 3x 3090 at 160k context, speeds between 65tk/s to 30tk/s (depending on lenght) , my script:
#11 opened 4 days ago
by
groxaxo
Did anyone get speculative decode working?
π
1
1
#10 opened 11 days ago
by
amit864
Successfully Running Qwen3-Next-80B-A3B-Instruct-AWQ-4bit on 3x RTX 3090s
β€οΈ
π€
4
4
#9 opened 21 days ago
by
8055izham
sorta works on vllm now
π
1
14
#8 opened about 1 month ago
by
MrDragonFox
Recent update throws error: KeyError: 'layers.30.mlp.shared_expert.down_proj.weight'
3
#7 opened about 1 month ago
by
itsmebcc
gibberish still persists?
5
#6 opened about 1 month ago
by
Geximus
MTP Accepted throughput always at 0.00 tokens/s
4
#5 opened about 1 month ago
by
bpozdena
Experiencing excessive response latency.
π
4
#4 opened about 1 month ago
by
JunHowie
Does this quantized version support running on machines like V100 and V100S?
β
1
#3 opened about 1 month ago
by
ShaoShuoHe
Error on inputting lots of prompts
#2 opened about 1 month ago
by
dwaynedu
Error when running in VLLM
π
2
18
#1 opened about 1 month ago
by
d8rt8v