Experimental global target bits‑per‑weight quantization of Qwen/Qwen3.6-27B and Qwen/Qwen3.6-35B-A3B.
Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target.
Key Advantages: - VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM). - Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs.
Full benchmarks (PPL, KLD, ARC, GPQA, MMLU, etc.) and methodology in the models' cards.
We are hiring at Shirova AI. We need AI researchers and engineers to work in our research lab. Shirova AI is a research lab in India, so we can help our researchers move to nearby workspaces or let them work from home without ever coming to the lab. We're building our founding team, so the pay will be good. You can learn, so don't hesitate to mail us at: careers@shirova.com