Instructions to use QuantFactory/Math-IIO-7B-Instruct-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QuantFactory/Math-IIO-7B-Instruct-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="QuantFactory/Math-IIO-7B-Instruct-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("QuantFactory/Math-IIO-7B-Instruct-GGUF", dtype="auto")

llama-cpp-python

How to use QuantFactory/Math-IIO-7B-Instruct-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="QuantFactory/Math-IIO-7B-Instruct-GGUF",
	filename="Math-IIO-7B-Instruct.Q2_K.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use QuantFactory/Math-IIO-7B-Instruct-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/Math-IIO-7B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/Math-IIO-7B-Instruct-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/Math-IIO-7B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/Math-IIO-7B-Instruct-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf QuantFactory/Math-IIO-7B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf QuantFactory/Math-IIO-7B-Instruct-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf QuantFactory/Math-IIO-7B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf QuantFactory/Math-IIO-7B-Instruct-GGUF:Q4_K_M

Use Docker

docker model run hf.co/QuantFactory/Math-IIO-7B-Instruct-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use QuantFactory/Math-IIO-7B-Instruct-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "QuantFactory/Math-IIO-7B-Instruct-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/Math-IIO-7B-Instruct-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/QuantFactory/Math-IIO-7B-Instruct-GGUF:Q4_K_M

SGLang

How to use QuantFactory/Math-IIO-7B-Instruct-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "QuantFactory/Math-IIO-7B-Instruct-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/Math-IIO-7B-Instruct-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "QuantFactory/Math-IIO-7B-Instruct-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/Math-IIO-7B-Instruct-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use QuantFactory/Math-IIO-7B-Instruct-GGUF with Ollama:
```
ollama run hf.co/QuantFactory/Math-IIO-7B-Instruct-GGUF:Q4_K_M
```

Unsloth Studio new

How to use QuantFactory/Math-IIO-7B-Instruct-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/Math-IIO-7B-Instruct-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/Math-IIO-7B-Instruct-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for QuantFactory/Math-IIO-7B-Instruct-GGUF to start chatting

Pi new

How to use QuantFactory/Math-IIO-7B-Instruct-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf QuantFactory/Math-IIO-7B-Instruct-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "QuantFactory/Math-IIO-7B-Instruct-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use QuantFactory/Math-IIO-7B-Instruct-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf QuantFactory/Math-IIO-7B-Instruct-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default QuantFactory/Math-IIO-7B-Instruct-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use QuantFactory/Math-IIO-7B-Instruct-GGUF with Docker Model Runner:
```
docker model run hf.co/QuantFactory/Math-IIO-7B-Instruct-GGUF:Q4_K_M
```

Lemonade

How to use QuantFactory/Math-IIO-7B-Instruct-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull QuantFactory/Math-IIO-7B-Instruct-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Math-IIO-7B-Instruct-GGUF-Q4_K_M

List all available models

lemonade list

Improve language tag

by lbourdois - opened Apr 28, 2025

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

+89

-80

Files changed (1) hide show

README.md +89 -80

README.md CHANGED Viewed

@@ -1,80 +1,89 @@
----
-license: creativeml-openrail-m
-datasets:
-- prithivMLmods/Math-IIO-68K-Mini
-language:
-- en
-base_model:
-- Qwen/Qwen2.5-7B-Instruct
-pipeline_tag: text-generation
-library_name: transformers
-tags:
-- safetensors
-- qwen2.5
-- 7B
-- Instruct
-- Math
-- CoT
-- one-shot
----
-[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
-# QuantFactory/Math-IIO-7B-Instruct-GGUF
-This is quantized version of [prithivMLmods/Math-IIO-7B-Instruct](https://huggingface.co/prithivMLmods/Math-IIO-7B-Instruct) created using llama.cpp
-# Original Model Card
-![aaa.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/faLfR-doaWP_BLUkOQrbq.png)
-### **Math IIO 7B Instruct**
-The **Math IIO 7B Instruct** is a fine-tuned language model based on the robust **Qwen2.5-7B-Instruct** architecture. This model has been specifically trained to excel in single-shot mathematical reasoning and instruction-based tasks, making it a reliable choice for educational, analytical, and problem-solving applications.
-### **Key Features:**
-1. **Math-Optimized Capabilities:**
-   The model is designed to handle complex mathematical problems, step-by-step calculations, and reasoning tasks.
-2. **Instruction-Tuned:**
-   Fine-tuned for better adherence to structured queries and task-oriented prompts, enabling clear and concise outputs.
-3. **Large Vocabulary:**
-   Equipped with an extensive tokenizer configuration and custom tokens to ensure precise mathematical notation support.
-| File Name                          | Size       | Description                                   | Upload Status  |
-|------------------------------------|------------|-----------------------------------------------|----------------|
-| `.gitattributes`                   | 1.57 kB    | Git attributes configuration file             | Uploaded       |
-| `README.md`                        | 263 Bytes  | README file with minimal details              | Updated        |
-| `added_tokens.json`                | 657 Bytes  | Custom added tokens for tokenizer             | Uploaded       |
-| `config.json`                      | 861 Bytes  | Model configuration file                      | Uploaded       |
-| `generation_config.json`           | 281 Bytes  | Configuration for text generation settings    | Uploaded       |
-| `merges.txt`                       | 1.82 MB    | Merge rules for byte pair encoding tokenizer  | Uploaded       |
-| `pytorch_model-00001-of-00004.bin` | 4.88 GB    | First part of model weights (PyTorch)         | Uploaded (LFS) |
-| `pytorch_model-00002-of-00004.bin` | 4.93 GB    | Second part of model weights (PyTorch)        | Uploaded (LFS) |
-| `pytorch_model-00003-of-00004.bin` | 4.33 GB    | Third part of model weights (PyTorch)         | Uploaded (LFS) |
-| `pytorch_model-00004-of-00004.bin` | 1.09 GB    | Fourth part of model weights (PyTorch)        | Uploaded (LFS) |
-| `pytorch_model.bin.index.json`     | 28.1 kB    | Index JSON file for model weights             | Uploaded       |
-| `special_tokens_map.json`          | 644 Bytes  | Map of special tokens used by the tokenizer   | Uploaded       |
-| `tokenizer.json`                   | 11.4 MB    | Tokenizer settings and vocab                  | Uploaded (LFS) |
-| `tokenizer_config.json`            | 7.73 kB    | Configuration for tokenizer                   | Uploaded       |
-| `vocab.json`                       | 2.78 MB    | Vocabulary for tokenizer                      | Uploaded       |
-### **Training Details:**
-- **Base Model:** [Qwen/Qwen2.5-7B-Instruct](#)
-- **Dataset:** Trained on **Math-IIO-68K-Mini**, a curated dataset with 68.8k high-quality examples focusing on mathematical instructions, equations, and logic-based queries.
-### **Capabilities:**
-- **Problem-Solving:** Solves mathematical problems ranging from basic arithmetic to advanced calculus and linear algebra.
-- **Educational Use:** Explains solutions step-by-step, making it a valuable teaching assistant.
-- **Analysis & Reasoning:** Handles logical reasoning tasks and computational queries effectively.
-### **How to Use:**
-1. Download all model files, ensuring the PyTorch weights and tokenizer configurations are included.
-2. Load the model in your Python environment using frameworks like PyTorch or Hugging Face Transformers.
-3. Use the provided configurations (`config.json` and `generation_config.json`) for optimal inference.

+---
+license: creativeml-openrail-m
+datasets:
+- prithivMLmods/Math-IIO-68K-Mini
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+base_model:
+- Qwen/Qwen2.5-7B-Instruct
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- safetensors
+- qwen2.5
+- 7B
+- Instruct
+- Math
+- CoT
+- one-shot
+---
+[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
+# QuantFactory/Math-IIO-7B-Instruct-GGUF
+This is quantized version of [prithivMLmods/Math-IIO-7B-Instruct](https://huggingface.co/prithivMLmods/Math-IIO-7B-Instruct) created using llama.cpp
+# Original Model Card
+![aaa.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/faLfR-doaWP_BLUkOQrbq.png)
+### **Math IIO 7B Instruct**
+The **Math IIO 7B Instruct** is a fine-tuned language model based on the robust **Qwen2.5-7B-Instruct** architecture. This model has been specifically trained to excel in single-shot mathematical reasoning and instruction-based tasks, making it a reliable choice for educational, analytical, and problem-solving applications.
+### **Key Features:**
+1. **Math-Optimized Capabilities:**
+   The model is designed to handle complex mathematical problems, step-by-step calculations, and reasoning tasks.
+2. **Instruction-Tuned:**
+   Fine-tuned for better adherence to structured queries and task-oriented prompts, enabling clear and concise outputs.
+3. **Large Vocabulary:**
+   Equipped with an extensive tokenizer configuration and custom tokens to ensure precise mathematical notation support.
+| File Name                          | Size       | Description                                   | Upload Status  |
+|------------------------------------|------------|-----------------------------------------------|----------------|
+| `.gitattributes`                   | 1.57 kB    | Git attributes configuration file             | Uploaded       |
+| `README.md`                        | 263 Bytes  | README file with minimal details              | Updated        |
+| `added_tokens.json`                | 657 Bytes  | Custom added tokens for tokenizer             | Uploaded       |
+| `config.json`                      | 861 Bytes  | Model configuration file                      | Uploaded       |
+| `generation_config.json`           | 281 Bytes  | Configuration for text generation settings    | Uploaded       |
+| `merges.txt`                       | 1.82 MB    | Merge rules for byte pair encoding tokenizer  | Uploaded       |
+| `pytorch_model-00001-of-00004.bin` | 4.88 GB    | First part of model weights (PyTorch)         | Uploaded (LFS) |
+| `pytorch_model-00002-of-00004.bin` | 4.93 GB    | Second part of model weights (PyTorch)        | Uploaded (LFS) |
+| `pytorch_model-00003-of-00004.bin` | 4.33 GB    | Third part of model weights (PyTorch)         | Uploaded (LFS) |
+| `pytorch_model-00004-of-00004.bin` | 1.09 GB    | Fourth part of model weights (PyTorch)        | Uploaded (LFS) |
+| `pytorch_model.bin.index.json`     | 28.1 kB    | Index JSON file for model weights             | Uploaded       |
+| `special_tokens_map.json`          | 644 Bytes  | Map of special tokens used by the tokenizer   | Uploaded       |
+| `tokenizer.json`                   | 11.4 MB    | Tokenizer settings and vocab                  | Uploaded (LFS) |
+| `tokenizer_config.json`            | 7.73 kB    | Configuration for tokenizer                   | Uploaded       |
+| `vocab.json`                       | 2.78 MB    | Vocabulary for tokenizer                      | Uploaded       |
+### **Training Details:**
+- **Base Model:** [Qwen/Qwen2.5-7B-Instruct](#)
+- **Dataset:** Trained on **Math-IIO-68K-Mini**, a curated dataset with 68.8k high-quality examples focusing on mathematical instructions, equations, and logic-based queries.
+### **Capabilities:**
+- **Problem-Solving:** Solves mathematical problems ranging from basic arithmetic to advanced calculus and linear algebra.
+- **Educational Use:** Explains solutions step-by-step, making it a valuable teaching assistant.
+- **Analysis & Reasoning:** Handles logical reasoning tasks and computational queries effectively.
+### **How to Use:**
+1. Download all model files, ensuring the PyTorch weights and tokenizer configurations are included.
+2. Load the model in your Python environment using frameworks like PyTorch or Hugging Face Transformers.
+3. Use the provided configurations (`config.json` and `generation_config.json`) for optimal inference.