danielhanchen commited on
Commit
90e0d61
·
verified ·
1 Parent(s): 7100229

Add files using upload-large-folder tool

Browse files
Q4_K_M/Qwen3-235B-A22B-Q4_K_M-00001-of-00003.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:540a34b4540ee2586f3565d11f5758bbef405ae915eac38ffc4ac7ba1360022c
3
- size 49944699712
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a8bc43ded2ef4868ea31302ddbfa8cd45dbbeecb476f4372337f41fe51b8a1cb
3
+ size 49944699648
README.md CHANGED
@@ -1,24 +1,14 @@
1
  ---
2
- base_model: Qwen/Qwen3-235B-A22B
3
- language:
4
- - en
5
- library_name: transformers
6
- license_link: https://huggingface.co/Qwen/Qwen3-235B-A22B/blob/main/LICENSE
7
- license: apache-2.0
8
  tags:
9
- - qwen3
10
- - qwen
11
  - unsloth
12
- - transformers
 
 
 
 
 
13
  ---
14
-
15
  <div>
16
- <p style="margin-bottom: 0; margin-top: 0;">
17
- <strong>See <a href="https://huggingface.co/collections/unsloth/qwen3-680edabfb790c8c34a242f95">our collection</a> for all versions of Qwen3 including GGUF, 4-bit & 16-bit formats.</strong>
18
- </p>
19
- <p style="margin-bottom: 0;">
20
- <em>Learn to run Qwen3 correctly - <a href="https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tune">Read our Guide</a>.</em>
21
- </p>
22
  <p style="margin-top: 0;margin-bottom: 0;">
23
  <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
24
  </p>
@@ -33,47 +23,13 @@ tags:
33
  <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
34
  </a>
35
  </div>
36
- <h1 style="margin-top: 0rem;">✨ Run & Fine-tune Qwen3 with Unsloth!</h1>
37
  </div>
38
 
39
- - Fine-tune Qwen3 (14B) for free using our Google [Colab notebook here](https://docs.unsloth.ai/get-started/unsloth-notebooks)!
40
- - Read our Blog about Qwen3 support: [unsloth.ai/blog/qwen3](https://unsloth.ai/blog/qwen3)
41
- - View the rest of our notebooks in our [docs here](https://docs.unsloth.ai/get-started/unsloth-notebooks).
42
- - Run & export your fine-tuned model to Ollama, llama.cpp or HF.
43
-
44
- | Unsloth supports | Free Notebooks | Performance | Memory use |
45
- |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
46
- | **Qwen3 (14B)** | [▶️ Start on Colab](https://docs.unsloth.ai/get-started/unsloth-notebooks) | 3x faster | 70% less |
47
- | **GRPO with Qwen3 (8B)** | [▶️ Start on Colab](https://docs.unsloth.ai/get-started/unsloth-notebooks) | 3x faster | 80% less |
48
- | **Llama-3.2 (3B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) | 2.4x faster | 58% less |
49
- | **Llama-3.2 (11B vision)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) | 2x faster | 60% less |
50
- | **Qwen2.5 (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynb) | 2x faster | 60% less |
51
- | **Phi-4 (14B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb) | 2x faster | 50% less |
52
-
53
- # To Switch Between Thinking and Non-Thinking
54
- If you are using llama.cpp, Ollama, Open WebUI etc., you can add `/think` and `/no_think` to user prompts or system messages to switch the model's thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations.
55
-
56
- Here is an example of multi-turn conversation:
57
-
58
- ```
59
- > Who are you /no_think
60
-
61
- <think>
62
-
63
- </think>
64
-
65
- I am Qwen, a large-scale language model developed by Alibaba Cloud. [...]
66
-
67
- > How many 'r's are in 'strawberries'? /think
68
-
69
- <think>
70
- Okay, let's see. The user is asking how many times the letter 'r' appears in the word "strawberries". [...]
71
- </think>
72
-
73
- The word strawberries contains 3 instances of the letter r. [...]
74
- ```
75
 
76
  # Qwen3-235B-A22B
 
 
 
77
 
78
  ## Qwen3 Highlights
79
 
@@ -157,21 +113,23 @@ print("thinking content:", thinking_content)
157
  print("content:", content)
158
  ```
159
 
160
- For deployment, you can use `vllm>=0.8.5` or `sglang>=0.4.5.post2` to create an OpenAI-compatible API endpoint:
161
- - vLLM:
162
  ```shell
163
- vllm serve Qwen/Qwen3-235B-A22B --enable-reasoning --reasoning-parser deepseek_r1
164
  ```
165
- - SGLang:
166
  ```shell
167
- python -m sglang.launch_server --model-path Qwen/Qwen3-235B-A22B --reasoning-parser deepseek-r1
168
  ```
169
 
 
 
170
  ## Switching Between Thinking and Non-Thinking Mode
171
 
172
  > [!TIP]
173
- > The `enable_thinking` switch is also available in APIs created by vLLM and SGLang.
174
- > Please refer to our documentation for [vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes) and [SGLang](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes) users.
175
 
176
  ### `enable_thinking=True`
177
 
@@ -269,7 +227,7 @@ if __name__ == "__main__":
269
  print(f"Bot: {response_3}")
270
  ```
271
 
272
- > **Note**
273
  > For API compatibility, when `enable_thinking=True`, regardless of whether the user uses `/think` or `/no_think`, the model will always output a block wrapped in `<think>...</think>`. However, the content inside this block may be empty if thinking is disabled.
274
  > When `enable_thinking=False`, the soft switches are not valid. Regardless of any `/think` or `/no_think` tags input by the user, the model will not generate think content and will not include a `<think>...</think>` block.
275
 
@@ -339,7 +297,7 @@ YaRN is currently supported by several inference frameworks, e.g., `transformers
339
  {
340
  ...,
341
  "rope_scaling": {
342
- "type": "yarn",
343
  "factor": 4.0,
344
  "original_max_position_embeddings": 32768
345
  }
@@ -351,12 +309,12 @@ YaRN is currently supported by several inference frameworks, e.g., `transformers
351
 
352
  For `vllm`, you can use
353
  ```shell
354
- vllm serve ... --rope-scaling '{"type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
355
  ```
356
 
357
  For `sglang`, you can use
358
  ```shell
359
- python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
360
  ```
361
 
362
  For `llama-server` from `llama.cpp`, you can use
 
1
  ---
 
 
 
 
 
 
2
  tags:
 
 
3
  - unsloth
4
+ base_model:
5
+ - Qwen/Qwen3-235B-A22B
6
+ library_name: transformers
7
+ license: apache-2.0
8
+ license_link: https://huggingface.co/Qwen/Qwen3-235B-A22B/blob/main/LICENSE
9
+ pipeline_tag: text-generation
10
  ---
 
11
  <div>
 
 
 
 
 
 
12
  <p style="margin-top: 0;margin-bottom: 0;">
13
  <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
14
  </p>
 
23
  <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
24
  </a>
25
  </div>
 
26
  </div>
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  # Qwen3-235B-A22B
30
+ <a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
31
+ <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
32
+ </a>
33
 
34
  ## Qwen3 Highlights
35
 
 
113
  print("content:", content)
114
  ```
115
 
116
+ For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
117
+ - SGLang:
118
  ```shell
119
+ python -m sglang.launch_server --model-path Qwen/Qwen3-235B-A22B --reasoning-parser qwen3 --tp 8
120
  ```
121
+ - vLLM:
122
  ```shell
123
+ vllm serve Qwen/Qwen3-235B-A22B --enable-reasoning --reasoning-parser deepseek_r1
124
  ```
125
 
126
+ For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
127
+
128
  ## Switching Between Thinking and Non-Thinking Mode
129
 
130
  > [!TIP]
131
+ > The `enable_thinking` switch is also available in APIs created by SGLang and vLLM.
132
+ > Please refer to our documentation for [SGLang](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes) and [vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes) users.
133
 
134
  ### `enable_thinking=True`
135
 
 
227
  print(f"Bot: {response_3}")
228
  ```
229
 
230
+ > [!NOTE]
231
  > For API compatibility, when `enable_thinking=True`, regardless of whether the user uses `/think` or `/no_think`, the model will always output a block wrapped in `<think>...</think>`. However, the content inside this block may be empty if thinking is disabled.
232
  > When `enable_thinking=False`, the soft switches are not valid. Regardless of any `/think` or `/no_think` tags input by the user, the model will not generate think content and will not include a `<think>...</think>` block.
233
 
 
297
  {
298
  ...,
299
  "rope_scaling": {
300
+ "rope_type": "yarn",
301
  "factor": 4.0,
302
  "original_max_position_embeddings": 32768
303
  }
 
309
 
310
  For `vllm`, you can use
311
  ```shell
312
+ vllm serve ... --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
313
  ```
314
 
315
  For `sglang`, you can use
316
  ```shell
317
+ python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
318
  ```
319
 
320
  For `llama-server` from `llama.cpp`, you can use
UD-Q2_K_XL/Qwen3-235B-A22B-UD-Q2_K_XL-00001-of-00002.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a71fe713905dd8ae4eed73d08ad678efaa3a3d39eecc0211ec8b49551ef60315
3
- size 49841583360
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:87e40d68e84690074c728b2a205f448e69362848fb8f6f9acfd990eab81382c0
3
+ size 49841583328
UD-Q3_K_XL/Qwen3-235B-A22B-UD-Q3_K_XL-00001-of-00003.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5c84ecae31bfab63449cb33c6ac514fcc0887e0e3c522da7956e73fee635fe0b
3
- size 49966873952
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:22650dab2363949c1b620b89e4896d3840d1d9365ca190af74ab92800d45a97f
3
+ size 49966873920
UD-Q4_K_XL/Qwen3-235B-A22B-UD-Q4_K_XL-00001-of-00003.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:633a1a7b6c3c4e2c53f9f3cfeedb4527a79aeda9d88e767e6a71a9a0dfffd06f
3
- size 49875808736
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f2717c62998244f2564a8dd5e7beb637671bbd3ccfafcac0658fd437c382737
3
+ size 49875808704
UD-Q5_K_XL/Qwen3-235B-A22B-UD-Q5_K_XL-00001-of-00004.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2adedf41f5224917f2e71435a41e9b8f27873ea0dd2aa719c0f5143bcc99228a
3
- size 49835132704
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e5c319f04db1b9c4fb072d6c9978cf377dd067c38324f6c6fd3a28b3f6694b7
3
+ size 49835132640
config.json CHANGED
@@ -4,7 +4,6 @@
4
  ],
5
  "attention_bias": false,
6
  "attention_dropout": 0.0,
7
- "bos_token_id": 151643,
8
  "decoder_sparse_step": 1,
9
  "eos_token_id": 151645,
10
  "head_dim": 128,
@@ -37,4 +36,4 @@
37
  "use_cache": true,
38
  "use_sliding_window": false,
39
  "vocab_size": 151936
40
- }
 
4
  ],
5
  "attention_bias": false,
6
  "attention_dropout": 0.0,
 
7
  "decoder_sparse_step": 1,
8
  "eos_token_id": 151645,
9
  "head_dim": 128,
 
36
  "use_cache": true,
37
  "use_sliding_window": false,
38
  "vocab_size": 151936
39
+ }