--- license: apache-2.0 pipeline_tag: text-generation library_name: mlx tags: - vllm - mlx base_model: openai/gpt-oss-120b --- # gpt-oss-120b-qx64-mlx The reason I created the qx64 and qx65 quants: I was looking to write some Perl as a Postgres function. Most other quants simplify, offer really well written PL/PGSQL instead. But I wanted PL/Perl. I am that guy. The [qx65 quant](https://huggingface.co/nightmedia/gpt-oss-120b-qx65-mlx) has given me what I asked. It followed instructions. Then I asked the qx64 the same question--why did you follow my instructions: I showed it this prompt. The qx65 gave me a very short, clean answer I could put as a comment in the code. The qx64 gave me the history of PL/Perl and how many nice things I could do with it. Until the performance metrics are available, please use these models with caution. -G ```bash 75.26 tok/sec 9338 tokens 2.58s to first token ``` This model [gpt-oss-120b-qx64-mlx](https://huggingface.co/gpt-oss-120b-qx64-mlx) was converted to MLX format from [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) using mlx-lm version **0.27.1**. ## Use with mlx ```bash pip install mlx-lm ``` ```python from mlx_lm import load, generate model, tokenizer = load("gpt-oss-120b-qx64-mlx") prompt = "hello" if tokenizer.chat_template is not None: messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) response = generate(model, tokenizer, prompt=prompt, verbose=True) ```