Need4Speed

company
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

need-for-speed's activity

wenhuachย 
posted an update 19 days ago
view post
Post
1878
AutoRound(https://github.com/intel/auto-round) has been integrated into vLLM , allowing you to run AutoRound-formatted models directly in the upcoming release.

Beside, we strongly recommend using AutoRound to generate AWQ INT4 models, as AutoAWQ is no longer maintained and manually configuring new models is not trivial due to the need for custom layer mappings.
loubnabnlย 
posted an update 23 days ago
wenhuachย 
posted an update about 1 month ago
wenhuachย 
posted an update 3 months ago
view post
Post
2533
Check out [DeepSeek-R1 INT2 model( OPEA/DeepSeek-R1-int2-mixed-sym-inc). This 200GB DeepSeek-R1 model shows only about a 2% drop in MMLU, though it's quite slow due to kernel issue.

| | BF16 | INT2-mixed |
| ------------- | ------ | ---------- |
| mmlu | 0.8514 | 0.8302 |
| hellaswag | 0.6935 | 0.6657 |
| winogrande | 0.7932 | 0.7940 |
| arc_challenge | 0.6212 | 0.6084 |
wenhuachย 
posted an update 4 months ago
wenhuachย 
posted an update 6 months ago
wenhuachย 
posted an update 6 months ago
view post
Post
1828
AutoRound has demonstrated strong results even at 2-bit precision for VLM models like QWEN2-VL-72B. Check it out here: OPEA/Qwen2-VL-72B-Instruct-int2-sym-inc.
  • 4 replies
ยท
wenhuachย 
posted an update 6 months ago
view post
Post
346
This week, OPEA Space released several new INT4 models, including:
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
allenai/OLMo-2-1124-13B-Instruct
THUDM/glm-4v-9b
AIDC-AI/Marco-o1
and several others.
Let us know which models you'd like prioritized for quantization, and we'll do our best to make it happen!

OPEA
  • 3 replies
ยท
wenhuachย 
posted an update 6 months ago
view post
Post
991
OPEA space just releases nearly 20 int4 models, for example, QWQ-32B-Preview,
Llama-3.2-11B-Vision-Instruct, Qwen2.5, Llama3.1, etc. Check out OPEA