Runs On My 32GB Mac
Collection
Some smarter, and some MoEs
•
71 items
•
Updated
•
2
The qx86-hi quant is between q6 and q8 in size, these being minor differences.
It is less boring than q8 or bf16.
With a standard coding task(Haskell) it returned in average most tokens on first prompt, with valuable code commentary and documentation.
The speed is CPU-limited, all quants are vibing at the same pace
Quantization Perplexity Tokens tok/sec
bf16 679.767 ± 14.542 13337 95.57
q8-hi 632.627 ± 13.342 13828 96.63
qx86-hi 571.714 ± 11.821 15602 97.97
q6-hi 495.759 ± 10.161 15886 106.89
Detailed metrics coming soon
-G
This model VibeCoder-20b-RL1_0-qx86-hi-mlx was converted to MLX format from EpistemeAI/VibeCoder-20b-RL1_0 using mlx-lm version 0.28.2.
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("VibeCoder-20b-RL1_0-qx86-hi-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
Base model
openai/gpt-oss-20b