Ellbendls/Qwen-3-4b-Text_to_SQL-GGUF
Quantized GGUF builds of Ellbendls/Qwen-3-4b-Text_to_SQL
for fast CPU/GPU inference with llama.cpp-compatible runtimes.
- Base model. Fine-tuned from Qwen/Qwen3-4B-Instruct-2507 for Text-to-SQL.
- License. Apache-2.0 (inherits from base). Keep attribution.
- Purpose. Turn natural language into SQL. When schema is missing, the model can infer a simple schema then produce SQL.
Files
Base and quantized variants:
Qwen-3-4b-Text_to_SQL-F16.gguf
β reference float16 exportQwen-3-4b-Text_to_SQL-q2_k.gguf
Qwen-3-4b-Text_to_SQL-q3_k_m.gguf
Qwen-3-4b-Text_to_SQL-q4_k_s.gguf
Qwen-3-4b-Text_to_SQL-q4_k_m.gguf
β good defaultQwen-3-4b-Text_to_SQL-q5_k_m.gguf
Qwen-3-4b-Text_to_SQL-q6_k.gguf
Qwen-3-4b-Text_to_SQL-q8_0.gguf
β near-lossless, larger
Conversion and quantization done with llama.cpp
.
Recommended pick
- Q4_K_M. Best balance of speed and quality for laptops and small servers.
- Q5_K_M. Higher quality, a bit more RAM/VRAM.
- Q8_0. Highest quality among quants. Use if you have headroom.
Approximate memory needs
These are ballpark for a 4B model. Real usage varies by runtime and context length.
- Q4_K_M: 3β4 GB RAM/VRAM
- Q5_K_M: 4β5 GB
- Q8_0: 6β8 GB
- F16: 10β12 GB
Quick start
llama.cpp (CLI)
CPU only:
./llama-cli -m Qwen-3-4b-Text_to_SQL-q4_k_m.gguf \
-p "Generate SQL to get average salary by department in 2024." \
-n 256 -t 6
NVIDIA GPU offload (build with -DLLAMA_CUBLAS=ON
):
./llama-cli -m Qwen-3-4b-Text_to_SQL-q4_k_m.gguf \
-p "Generate SQL to get average salary by department in 2024." \
-n 256 -ngl 999 -t 6
Python (llama-cpp-python)
from llama_cpp import Llama
llm = Llama(model_path="Qwen-3-4b-Text_to_SQL-q4_k_m.gguf", n_ctx=4096, n_gpu_layers=35) # set 0 for CPU-only
prompt = "Generate SQL to list total orders and revenue by month for 2024."
out = llm(prompt, max_tokens=256, temperature=0.2, top_p=0.9)
print(out["choices"][0]["text"].strip())
LM Studio / Kobold / text-generation-webui
- Select the
.gguf
file and load. - Set temperature 0.1β0.3 for deterministic SQL.
- Use a system prompt to anchor behavior.
Model details
- Base.
Qwen/Qwen3-4B-Instruct-2507
(32k context, multilingual). - Fine-tune. Trained on
gretelai/synthetic_text_to_sql
. - Task. NL β SQL. Capable of simple schema inference when needed.
- Languages. Works best in English. Can follow prompts in several languages from the base model.
Conversion reproducibility
Export used:
python convert_hf_to_gguf.py /path/to/hf_model --outtype f16 --outfile Qwen-3-4b-Text_to_SQL-F16.gguf
Quantization used:
./llama-quantize Qwen-3-4b-Text_to_SQL-F16.gguf Qwen-3-4b-Text_to_SQL-q4_k_m.gguf Q4_K_M
# likewise for q2_k, q3_k_m, q5_k_m, q8_0
Intended use and limits
- Use. Analytics, reporting, dashboards, data exploration, SQL prototyping.
- Limits. No database connectivity. It only generates SQL text. Validate and test queries before use in production. Provide real schema for best accuracy.
Attribution
- Base model:
Qwen/Qwen3-4B-Instruct-2507
- Fine-tuned model:
Ellbendls/Qwen-3-4b-Text_to_SQL
License
Apache-2.0. Include license and NOTICE from upstream when redistributing the weights. Do not imply endorsement from Qwen or original authors.
Changelog
- 2025-09-17. Initial GGUF release. Added q2_k, q3_k_m, q4_k_m, q5_k_m, q8_0, and F16.
::contentReference[oaicite:0]{index=0}
- Downloads last month
- 2,977
Hardware compatibility
Log In
to view the estimation
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for Ellbendls/Qwen-3-4b-Text_to_SQL-GGUF
Base model
Qwen/Qwen3-4B-Instruct-2507
Finetuned
Ellbendls/Qwen-3-4b-Text_to_SQL