Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

cpatonn
/
Qwen3-Next-80B-A3B-Instruct-AWQ-4bit

Text Generation
Transformers
Safetensors
qwen3_next
conversational
compressed-tensors
Model card Files Files and versions
xet
Community
11
New discussion
Resources
  • PR & discussions documentation
  • Code of Conduct
  • Hub documentation

Bloody hell!! running perfectly on 3x 3090 at 160k context, speeds between 65tk/s to 30tk/s (depending on lenght) , my script:

#11 opened 4 days ago by
groxaxo

Did anyone get speculative decode working?

πŸ‘€ 1
1
#10 opened 11 days ago by
amit864

Successfully Running Qwen3-Next-80B-A3B-Instruct-AWQ-4bit on 3x RTX 3090s

❀️ 🀝 4
4
#9 opened 21 days ago by
8055izham

sorta works on vllm now

πŸ‘ 1
14
#8 opened about 1 month ago by
MrDragonFox

Recent update throws error: KeyError: 'layers.30.mlp.shared_expert.down_proj.weight'

3
#7 opened about 1 month ago by
itsmebcc

gibberish still persists?

5
#6 opened about 1 month ago by
Geximus

MTP Accepted throughput always at 0.00 tokens/s

4
#5 opened about 1 month ago by
bpozdena

Experiencing excessive response latency.

πŸ‘ 4
#4 opened about 1 month ago by
JunHowie

Does this quantized version support running on machines like V100 and V100S?

βž• 1
#3 opened about 1 month ago by
ShaoShuoHe

Error on inputting lots of prompts

#2 opened about 1 month ago by
dwaynedu

Error when running in VLLM

πŸ‘ 2
18
#1 opened about 1 month ago by
d8rt8v
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs