Model size 12B

by amine-khelif - opened 6 days ago

Discussion

amine-khelif

6 days ago

Why model size says 12B instead of 20B

wanng

6 days ago

B ≠ GB

amine-khelif

6 days ago

No one said GB

amine-khelif

6 days ago

Maybe HF doesn’t handle the new format yet

Rusalmighty

6 days ago

it is probably due to mixed precision

vikhyatk

6 days ago

The expert weights are 4-bit but packed as U8, so you have to double the param count from each of those tensors.

century

6 days ago

The expert weights are 4-bit but packed as U8, so you have to double the param count from each of those tensors.

Yes, that's correct — we can check the parameter shapes in the model.safetensors.index.json file , and for the U8-packed 4-bit expert weights, we need to double the parameter count accordingly.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment