📦 Model Card: Jackrong/gpt-oss-120b-Distill-Qwen3-4B-Thinking
| Key Property | Value |
|---|---|
| Model ID | Jackrong/gpt-oss-120b-Distill-Qwen3-4B-Thinking |
| License | apache-2.0 |
| Author(s) | Jackrong, gpt‑oss team, Qwen authors |
| Base Model | gpt-oss-120b-high (complex reasoning dataset distilled) |
| Target Size | ~ 4B parameters (Qwen3‑4B distilled version) |
🔍 Overview
A deeply distilled and fine-tuned variant of the large‑language model gpt-oss-120b-high, optimized for human‑friendly, high‑fidelity reasoning. The model preserves the original’s multi‑step thinking patterns while compressing them onto a lightweight 4B‑parameter backbone (the “Distill‑Qwen3” architecture). Its signature feature is an explicit point‑by‑point thought chain that makes intricate logic transparent and easy to follow, ideal for education, technical support, and analytical tasks.
💡 Think of it as the “thinking mode” you’d expect from a massive
🛠️ Technical Details
| Aspect | Specification |
|---|---|
| Source Model | gpt-oss‑120b‑high (complex reasoning dataset distilled) |
| Distillation Target | Qwen3‑4B architecture |
| Supervised Fine‑Tuning (SFT) | ~ 30,000 examples drawn from the source’s high‑fidelity reasoning corpus |
| Training Hardware | Single NVIDIA H100‑80GB GPU |
| Max Context Length | 32 768 tokens – enables multi‑paragraph, long‑form reasoning without truncation |
| Reasoning Style | Default: Bullet‑point “thought chain” output (e.g., • Step 1 → …\n• Step 2 → …) |
🎯 Recommended Use Cases
| Case | When to use |
|---|---|
| Technical tutorials | Leverage bullet‑point logic for stepwise code walkthroughs |
| Complex queries (e.g., math, engineering) | The model’s deep reasoning helps avoid oversimplified answers |
| User education | Clear, scannable outputs aid learning and reduce confusion |
| Moderation/analysis | The structured format makes it easier to parse responses programmatically |
📚 Credits & Contributors
- gpt‑oss team: Provided the high‑fidelity complex‑reasoning dataset.
- Qwen3 authors: Open‑source architecture used as distillation target.
- Jackrong: Implemented the final SFT and packaging for Hugging Face Hub.
- Downloads last month
- 459
Hardware compatibility
Log In
to view the estimation
4-bit
16-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support