File size: 4,026 Bytes
3ed7704
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2212d7b
 
3ed7704
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2212d7b
3ed7704
2212d7b
3ed7704
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149

---
library_name: gguf
license: apache-2.0
base_model:
- Ellbendls/Qwen-3-4b-Text_to_SQL
- Qwen/Qwen3-4B-Instruct-2507
tags:
- gguf
- llama.cpp
- qwen
- text-to-sql
- sql
- instruct
language:
- eng
- zho
- fra
- spa
- por
- deu
- ita
- rus
- jpn
- kor
- vie
- tha
- ara
pipeline_tag: text-generation
---

# Ellbendls/Qwen-3-4b-Text_to_SQL-GGUF

Quantized GGUF builds of `Ellbendls/Qwen-3-4b-Text_to_SQL` for fast CPU/GPU inference with llama.cpp-compatible runtimes.

- **Base model**. Fine-tuned from **Qwen/Qwen3-4B-Instruct-2507** for Text-to-SQL.
- **License**. Apache-2.0 (inherits from base). Keep attribution.
- **Purpose**. Turn natural language into SQL. When schema is missing, the model can infer a simple schema then produce SQL.

## Files

Base and quantized variants:

- `Qwen-3-4b-Text_to_SQL-F16.gguf`  β€” reference float16 export
- `Qwen-3-4b-Text_to_SQL-q2_k.gguf`
- `Qwen-3-4b-Text_to_SQL-q3_k_m.gguf`
- `Qwen-3-4b-Text_to_SQL-q4_k_s.gguf`
- `Qwen-3-4b-Text_to_SQL-q4_k_m.gguf`  ← good default
- `Qwen-3-4b-Text_to_SQL-q5_k_m.gguf`
- `Qwen-3-4b-Text_to_SQL-q6_k.gguf`
- `Qwen-3-4b-Text_to_SQL-q8_0.gguf`    ← near-lossless, larger

Conversion and quantization done with `llama.cpp`.

## Recommended pick

- **Q4_K_M**. Best balance of speed and quality for laptops and small servers.
- **Q5_K_M**. Higher quality, a bit more RAM/VRAM.
- **Q8_0**. Highest quality among quants. Use if you have headroom.

## Approximate memory needs

These are ballpark for a 4B model. Real usage varies by runtime and context length.

- Q4_K_M: 3–4 GB RAM/VRAM
- Q5_K_M: 4–5 GB
- Q8_0: 6–8 GB
- F16: 10–12 GB

## Quick start

### llama.cpp (CLI)

CPU only:
```bash
./llama-cli -m Qwen-3-4b-Text_to_SQL-q4_k_m.gguf \
  -p "Generate SQL to get average salary by department in 2024." \
  -n 256 -t 6
````

NVIDIA GPU offload (build with `-DLLAMA_CUBLAS=ON`):

```bash
./llama-cli -m Qwen-3-4b-Text_to_SQL-q4_k_m.gguf \
  -p "Generate SQL to get average salary by department in 2024." \
  -n 256 -ngl 999 -t 6
```

### Python (llama-cpp-python)

```python
from llama_cpp import Llama

llm = Llama(model_path="Qwen-3-4b-Text_to_SQL-q4_k_m.gguf", n_ctx=4096, n_gpu_layers=35)  # set 0 for CPU-only
prompt = "Generate SQL to list total orders and revenue by month for 2024."
out = llm(prompt, max_tokens=256, temperature=0.2, top_p=0.9)
print(out["choices"][0]["text"].strip())
```

### LM Studio / Kobold / text-generation-webui

* Select the `.gguf` file and load.
* Set temperature 0.1–0.3 for deterministic SQL.
* Use a system prompt to anchor behavior.

## Model details

* **Base**. `Qwen/Qwen3-4B-Instruct-2507` (32k context, multilingual).
* **Fine-tune**. Trained on `gretelai/synthetic_text_to_sql`.
* **Task**. NL β†’ SQL. Capable of simple schema inference when needed.
* **Languages**. Works best in English. Can follow prompts in several languages from the base model.

## Conversion reproducibility

Export used:

```bash
python convert_hf_to_gguf.py /path/to/hf_model --outtype f16 --outfile Qwen-3-4b-Text_to_SQL-F16.gguf
```

Quantization used:

```bash
./llama-quantize Qwen-3-4b-Text_to_SQL-F16.gguf Qwen-3-4b-Text_to_SQL-q4_k_m.gguf Q4_K_M
# likewise for q2_k, q3_k_m, q5_k_m, q8_0
```

## Intended use and limits

* **Use**. Analytics, reporting, dashboards, data exploration, SQL prototyping.
* **Limits**. No database connectivity. It only generates SQL text. Validate and test queries before use in production. Provide real schema for best accuracy.

## Attribution

* Base model: [`Qwen/Qwen3-4B-Instruct-2507`](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
* Fine-tuned model: [`Ellbendls/Qwen-3-4b-Text_to_SQL`](https://huggingface.co/Ellbendls/Qwen-3-4b-Text_to_SQL)

## License

Apache-2.0. Include license and NOTICE from upstream when redistributing the weights. Do not imply endorsement from Qwen or original authors.

## Changelog

* 2025-09-17. Initial GGUF release. Added q2\_k, q3\_k\_m, q4\_k\_m, q5\_k\_m, q8\_0, and F16.

```
::contentReference[oaicite:0]{index=0}
```