sebastavar commited on
Commit
62fceaa
·
verified ·
1 Parent(s): 930c155

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -3
README.md CHANGED
@@ -1,3 +1,99 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: mlx
3
+ pipeline_tag: text-generation
4
+ inference: false
5
+ license: apache-2.0
6
+ base_model: openai/gpt-oss-120b
7
+ language:
8
+ - en
9
+ - ro
10
+ tags:
11
+ - apple-silicon
12
+ - metal
13
+ - arm64
14
+ - bf16
15
+ - mlx
16
+ - mlx-lm
17
+ - openai
18
+ - halley-ai
19
+ ---
20
+
21
+ # gpt-oss-120b — MLX bf16 (non-quantized)
22
+
23
+ **Summary.** This is a non-quantized MLX conversion of gpt-oss-120B in bfloat16 (bf16). Built for Apple Silicon with Metal acceleration.
24
+
25
+ - **Base model:** `openai/gpt-oss-120b` (Apache-2.0)
26
+ - **Precision:** bfloat16 (no quantization)
27
+ - **Files:** MLX weight shards + `config.json`; tokenizer files included for drop-in use
28
+ - **Intended use:** local inference / research on M-series Macs
29
+ - **Not intended for:** safety-critical decisions; outputs may be inaccurate or biased
30
+
31
+ ## Requirements
32
+
33
+ Runs on Apple Silicon (M1 or newer) with macOS ≥ 13.5 via MLX (Metal).
34
+
35
+ - Not supported: Intel macOS / Linux / Windows (consider a GGUF build + llama.cpp instead).
36
+ - Memory guidance: large unified memory recommended (e.g., 64–96 GB). The effective GPU working set is capped by Metal’s budget; keep 5–10% headroom.
37
+
38
+ ## How to use (MLX)
39
+
40
+ ```bash
41
+ pip install mlx-lm
42
+ ```
43
+
44
+ ```python
45
+ # Python API (uses tokenizer bundled with this repo)
46
+ from mlx_lm import load, generate
47
+
48
+ model, tokenizer = load("halley-ai/gpt-oss-120b-MLX-bf16")
49
+ print(generate(
50
+ model, tokenizer,
51
+ prompt="Explain the Chudnovsky algorithm to compute π.",
52
+ max_tokens=256, max_kv_size=512
53
+ ))
54
+ ```
55
+
56
+ ```bash
57
+ # CLI
58
+ python -m mlx_lm generate --model halley-ai/gpt-oss-120b-MLX-bf16 \
59
+ --prompt "Explain the Chudnovsky algorithm to compute pi." \
60
+ --max-kv-size 512 --max-tokens 256
61
+ ```
62
+
63
+ ## Evaluation
64
+
65
+ Perplexity (PPL) streaming evaluation on WikiText-2 (raw, test); fast preset with `window=stride=4096`, ~100k tokens, EOS inserted between docs.
66
+
67
+ | Variant | PPL (ctx=4096, fast) |
68
+ |----------------------|-----------------------|
69
+ | MLX bf16 (non-quant) | 7.38 |
70
+ | MLX 8-bit (gs=32) | 7.39 |
71
+
72
+ Notes:
73
+
74
+ - Results from local runs on Apple Silicon using MLX; numbers vary slightly with tokenizer details, logits dtype, and token subset.
75
+ - For more sensitive comparisons, use overlapping windows (e.g., `--stride 512`) and evaluate the full split.
76
+
77
+ ## Conversion details (provenance)
78
+
79
+ ```bash
80
+ python -m mlx_lm convert \
81
+ --hf-path openai/gpt-oss-120b \
82
+ --mlx-path gpt-oss-120b-MLX-bf16 \
83
+ --dtype bfloat16
84
+ ```
85
+
86
+ ## Sibling & reference models
87
+
88
+ - halley-ai/gpt-oss-120b-MLX-8bit-gs32 (int8, group size 32)
89
+
90
+ ## Limitations & biases
91
+
92
+ Outputs may be factually wrong or unsafe. Do not use for medical, legal, or financial decisions without human review. Large models can be sensitive to prompts; prefer explicit instructions and structure.
93
+
94
+ ## License & credits
95
+
96
+ - License: Apache-2.0 (inherits from base model)
97
+ - Base model: OpenAI gpt-oss-120B
98
+ - Conversion: Halley AI Lab (MLX bf16)
99
+ - Please cite both the base model and this repository when you use the weights.