Add files using upload-large-folder tool

Browse files

Files changed (7) hide show

Q4_K_M/Qwen3-235B-A22B-Q4_K_M-00001-of-00003.gguf +2 -2
README.md +22 -64
UD-Q2_K_XL/Qwen3-235B-A22B-UD-Q2_K_XL-00001-of-00002.gguf +2 -2
UD-Q3_K_XL/Qwen3-235B-A22B-UD-Q3_K_XL-00001-of-00003.gguf +2 -2
UD-Q4_K_XL/Qwen3-235B-A22B-UD-Q4_K_XL-00001-of-00003.gguf +2 -2
UD-Q5_K_XL/Qwen3-235B-A22B-UD-Q5_K_XL-00001-of-00004.gguf +2 -2
config.json +1 -2

Q4_K_M/Qwen3-235B-A22B-Q4_K_M-00001-of-00003.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:540a34b4540ee2586f3565d11f5758bbef405ae915eac38ffc4ac7ba1360022c
-size 49944699712

 version https://git-lfs.github.com/spec/v1
+oid sha256:a8bc43ded2ef4868ea31302ddbfa8cd45dbbeecb476f4372337f41fe51b8a1cb
+size 49944699648

README.md CHANGED Viewed

@@ -1,24 +1,14 @@
 ---
-base_model: Qwen/Qwen3-235B-A22B
-language:
-- en
-library_name: transformers
-license_link: https://huggingface.co/Qwen/Qwen3-235B-A22B/blob/main/LICENSE
-license: apache-2.0
 tags:
-- qwen3
-- qwen
 - unsloth
-- transformers
 ---
 <div>
-  <p style="margin-bottom: 0; margin-top: 0;">
-    <strong>See <a href="https://huggingface.co/collections/unsloth/qwen3-680edabfb790c8c34a242f95">our collection</a> for all versions of Qwen3 including GGUF, 4-bit & 16-bit formats.</strong>
-  </p>
-  <p style="margin-bottom: 0;">
-    <em>Learn to run Qwen3 correctly - <a href="https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tune">Read our Guide</a>.</em>
-  </p>
 <p style="margin-top: 0;margin-bottom: 0;">
     <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
   </p>
@@ -33,47 +23,13 @@ tags:
       <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
     </a>
   </div>
-<h1 style="margin-top: 0rem;">✨ Run & Fine-tune Qwen3 with Unsloth!</h1>
 </div>
-- Fine-tune Qwen3 (14B) for free using our Google [Colab notebook here](https://docs.unsloth.ai/get-started/unsloth-notebooks)!
-- Read our Blog about Qwen3 support: [unsloth.ai/blog/qwen3](https://unsloth.ai/blog/qwen3)
-- View the rest of our notebooks in our [docs here](https://docs.unsloth.ai/get-started/unsloth-notebooks).
-- Run & export your fine-tuned model to Ollama, llama.cpp or HF.
-| Unsloth supports          |    Free Notebooks                                                                                           | Performance | Memory use |
-|-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
-| **Qwen3 (14B)**      | [▶️ Start on Colab](https://docs.unsloth.ai/get-started/unsloth-notebooks)               | 3x faster | 70% less |
-| **GRPO with Qwen3 (8B)**      | [▶️ Start on Colab](https://docs.unsloth.ai/get-started/unsloth-notebooks)               | 3x faster | 80% less |
-| **Llama-3.2 (3B)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb)               | 2.4x faster | 58% less |
-| **Llama-3.2 (11B vision)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)               | 2x faster | 60% less |
-| **Qwen2.5 (7B)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynb)               | 2x faster | 60% less |
-| **Phi-4 (14B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb)               | 2x faster | 50% less |
-# To Switch Between Thinking and Non-Thinking
-If you are using llama.cpp, Ollama, Open WebUI etc., you can add `/think` and `/no_think` to user prompts or system messages to switch the model's thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations.
-Here is an example of multi-turn conversation:
-```
-> Who are you /no_think
-<think>
-</think>
-I am Qwen, a large-scale language model developed by Alibaba Cloud. [...]
-> How many 'r's are in 'strawberries'? /think
-<think>
-Okay, let's see. The user is asking how many times the letter 'r' appears in the word "strawberries". [...]
-</think>
-The word strawberries contains 3 instances of the letter r. [...]
-```
 # Qwen3-235B-A22B
 ## Qwen3 Highlights
@@ -157,21 +113,23 @@ print("thinking content:", thinking_content)
 print("content:", content)
 ```
-For deployment, you can use `vllm>=0.8.5` or `sglang>=0.4.5.post2` to create an OpenAI-compatible API endpoint:
-- vLLM:
     ```shell
-    vllm serve Qwen/Qwen3-235B-A22B --enable-reasoning --reasoning-parser deepseek_r1
     ```
-- SGLang:
     ```shell
-    python -m sglang.launch_server --model-path Qwen/Qwen3-235B-A22B --reasoning-parser deepseek-r1
     ```
 ## Switching Between Thinking and Non-Thinking Mode
 > [!TIP]
-> The `enable_thinking` switch is also available in APIs created by vLLM and SGLang.
-> Please refer to our documentation for [vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes) and [SGLang](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes) users.
 ### `enable_thinking=True`
@@ -269,7 +227,7 @@ if __name__ == "__main__":
     print(f"Bot: {response_3}")
 ```
-> **Note**
 > For API compatibility, when `enable_thinking=True`, regardless of whether the user uses `/think` or `/no_think`, the model will always output a block wrapped in `<think>...</think>`. However, the content inside this block may be empty if thinking is disabled.
 > When `enable_thinking=False`, the soft switches are not valid. Regardless of any `/think` or `/no_think` tags input by the user, the model will not generate think content and will not include a `<think>...</think>` block.
@@ -339,7 +297,7 @@ YaRN is currently supported by several inference frameworks, e.g., `transformers
     {
         ...,
         "rope_scaling": {
-            "type": "yarn",
             "factor": 4.0,
             "original_max_position_embeddings": 32768
         }
@@ -351,12 +309,12 @@ YaRN is currently supported by several inference frameworks, e.g., `transformers
   For `vllm`, you can use
     ```shell
-    vllm serve ... --rope-scaling '{"type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
     ```
   For `sglang`, you can use
     ```shell
-    python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
     ```
   For `llama-server` from `llama.cpp`, you can use

 ---
 tags:
 - unsloth
+base_model:
+- Qwen/Qwen3-235B-A22B
+library_name: transformers
+license: apache-2.0
+license_link: https://huggingface.co/Qwen/Qwen3-235B-A22B/blob/main/LICENSE
+pipeline_tag: text-generation
 ---
 <div>
 <p style="margin-top: 0;margin-bottom: 0;">
     <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
   </p>
       <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
     </a>
   </div>
 </div>
 # Qwen3-235B-A22B
+<a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
+    <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
+</a>
 ## Qwen3 Highlights
 print("content:", content)
 ```
+For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
+- SGLang:
     ```shell
+    python -m sglang.launch_server --model-path Qwen/Qwen3-235B-A22B --reasoning-parser qwen3 --tp 8
     ```
+- vLLM:
     ```shell
+    vllm serve Qwen/Qwen3-235B-A22B --enable-reasoning --reasoning-parser deepseek_r1
     ```
+For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
 ## Switching Between Thinking and Non-Thinking Mode
 > [!TIP]
+> The `enable_thinking` switch is also available in APIs created by SGLang and vLLM.
+> Please refer to our documentation for [SGLang](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes) and [vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes) users.
 ### `enable_thinking=True`
     print(f"Bot: {response_3}")
 ```
+> [!NOTE]
 > For API compatibility, when `enable_thinking=True`, regardless of whether the user uses `/think` or `/no_think`, the model will always output a block wrapped in `<think>...</think>`. However, the content inside this block may be empty if thinking is disabled.
 > When `enable_thinking=False`, the soft switches are not valid. Regardless of any `/think` or `/no_think` tags input by the user, the model will not generate think content and will not include a `<think>...</think>` block.
     {
         ...,
         "rope_scaling": {
+            "rope_type": "yarn",
             "factor": 4.0,
             "original_max_position_embeddings": 32768
         }
   For `vllm`, you can use
     ```shell
+    vllm serve ... --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
     ```
   For `sglang`, you can use
     ```shell
+    python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
     ```
   For `llama-server` from `llama.cpp`, you can use

UD-Q2_K_XL/Qwen3-235B-A22B-UD-Q2_K_XL-00001-of-00002.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a71fe713905dd8ae4eed73d08ad678efaa3a3d39eecc0211ec8b49551ef60315
-size 49841583360

 version https://git-lfs.github.com/spec/v1
+oid sha256:87e40d68e84690074c728b2a205f448e69362848fb8f6f9acfd990eab81382c0
+size 49841583328

UD-Q3_K_XL/Qwen3-235B-A22B-UD-Q3_K_XL-00001-of-00003.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5c84ecae31bfab63449cb33c6ac514fcc0887e0e3c522da7956e73fee635fe0b
-size 49966873952

 version https://git-lfs.github.com/spec/v1
+oid sha256:22650dab2363949c1b620b89e4896d3840d1d9365ca190af74ab92800d45a97f
+size 49966873920

UD-Q4_K_XL/Qwen3-235B-A22B-UD-Q4_K_XL-00001-of-00003.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:633a1a7b6c3c4e2c53f9f3cfeedb4527a79aeda9d88e767e6a71a9a0dfffd06f
-size 49875808736

 version https://git-lfs.github.com/spec/v1
+oid sha256:2f2717c62998244f2564a8dd5e7beb637671bbd3ccfafcac0658fd437c382737
+size 49875808704

UD-Q5_K_XL/Qwen3-235B-A22B-UD-Q5_K_XL-00001-of-00004.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2adedf41f5224917f2e71435a41e9b8f27873ea0dd2aa719c0f5143bcc99228a
-size 49835132704

 version https://git-lfs.github.com/spec/v1
+oid sha256:6e5c319f04db1b9c4fb072d6c9978cf377dd067c38324f6c6fd3a28b3f6694b7
+size 49835132640

config.json CHANGED Viewed

@@ -4,7 +4,6 @@
   ],
   "attention_bias": false,
   "attention_dropout": 0.0,
-  "bos_token_id": 151643,
   "decoder_sparse_step": 1,
   "eos_token_id": 151645,
   "head_dim": 128,
@@ -37,4 +36,4 @@
   "use_cache": true,
   "use_sliding_window": false,
   "vocab_size": 151936
-}

   ],
   "attention_bias": false,
   "attention_dropout": 0.0,
   "decoder_sparse_step": 1,
   "eos_token_id": 151645,
   "head_dim": 128,
   "use_cache": true,
   "use_sliding_window": false,
   "vocab_size": 151936
+}