VibeCoder-20b-RL1_0-qx86-hi-mlx

The qx86-hi quant is between q6 and q8 in size, these being minor differences.

It is less boring than q8 or bf16.

With a standard coding task(Haskell) it returned in average most tokens on first prompt, with valuable code commentary and documentation.

The speed is CPU-limited, all quants are vibing at the same pace

Quantization   Perplexity        Tokens  tok/sec
bf16           679.767 ± 14.542  13337   95.57
q8-hi          632.627 ± 13.342  13828   96.63
qx86-hi        571.714 ± 11.821  15602   97.97
q6-hi          495.759 ± 10.161  15886   106.89

Detailed metrics coming soon

-G

This model VibeCoder-20b-RL1_0-qx86-hi-mlx was converted to MLX format from EpistemeAI/VibeCoder-20b-RL1_0 using mlx-lm version 0.28.2.

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("VibeCoder-20b-RL1_0-qx86-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)