Improve model card: Add GitHub link and Python usage example

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +81 -21
README.md CHANGED
@@ -1,20 +1,20 @@
1
  ---
2
- pipeline_tag: text-generation
3
  library_name: transformers
4
  license: cc-by-nc-4.0
 
5
  tags:
6
  - text-to-sql
7
  - reinforcement-learning
8
  ---
9
 
10
-
11
  # SLM-SQL: An Exploration of Small Language Models for Text-to-SQL
12
 
13
  ### Important Links
14
 
15
- 📖[Arxiv Paper](https://arxiv.org/abs/2507.22478) |
16
- 🤗[HuggingFace](https://huggingface.co/collections/cycloneboy/slm-sql-688b02f99f958d7a417658dc) |
17
- 🤖[ModelScope](https://modelscope.cn/collections/SLM-SQL-624bb6a60e9643) |
 
18
 
19
  ## News
20
 
@@ -35,8 +35,8 @@ tags:
35
  > effectiveness
36
  > and generalizability of our method, SLM-SQL. On the BIRD development set, the five evaluated models achieved an
37
  > average
38
- > improvement of 31.4 points. Notably, the 0.5B model reached 56.87\% execution accuracy (EX), while the 1.5B model
39
- > achieved 67.08\% EX. We will release our dataset, model, and code to github: https://github.com/CycloneBoy/slm_sql.
40
 
41
  ### Framework
42
 
@@ -55,29 +55,89 @@ Performance Comparison of different Text-to-SQL methods on BIRD dev and test dat
55
 
56
  <img src="https://raw.githubusercontent.com/CycloneBoy/slm_sql/main/data/image/slmsql_ablation_study.png" height="300" alt="slmsql_ablation_study">
57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  ## Model
59
 
60
  | **Model** | Base Model | Train Method | Modelscope | HuggingFace |
61
  |------------------------------------------|------------------------------|--------------|---------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|
62
- | SLM-SQL-Base-0.5B | Qwen2.5-Coder-0.5B-Instruct | SFT | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-Base-0.5B) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-Base-0.5B) |
63
- | SLM-SQL-0.5B | Qwen2.5-Coder-0.5B-Instruct | SFT + GRPO | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-0.5B) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-0.5B) |
64
- | CscSQL-Merge-Qwen2.5-Coder-0.5B-Instruct | Qwen2.5-Coder-0.5B-Instruct | SFT + GRPO | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/CscSQL-Merge-Qwen2.5-Coder-0.5B-Instruct) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/CscSQL-Merge-Qwen2.5-Coder-0.5B-Instruct) |
65
- | SLM-SQL-Base-1.5B | Qwen2.5-Coder-1.5B-Instruct | SFT | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-Base-1.5B) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-Base-1.5B) |
66
- | SLM-SQL-1.5B | Qwen2.5-Coder-1.5B-Instruct | SFT + GRPO | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-1.5B) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-1.5B) |
67
- | CscSQL-Merge-Qwen2.5-Coder-1.5B-Instruct | Qwen2.5-Coder-1.5B-Instruct | SFT + GRPO | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/CscSQL-Merge-Qwen2.5-Coder-1.5B-Instruct) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/CscSQL-Merge-Qwen2.5-Coder-1.5B-Instruct) |
68
- | SLM-SQL-Base-0.6B | Qwen3-0.6B | SFT | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-Base-0.6B) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-Base-0.6B) |
69
- | SLM-SQL-0.6B | Qwen3-0.6B | SFT + GRPO | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-0.6B) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-0.6B) |
70
- | SLM-SQL-Base-1.3B | deepseek-coder-1.3b-instruct | SFT | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-Base-1.3B ) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-Base-1.3B ) |
71
- | SLM-SQL-1.3B | deepseek-coder-1.3b-instruct | SFT + GRPO | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-1.3B ) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-1.3B ) |
72
- | SLM-SQL-Base-1B | Llama-3.2-1B-Instruct | SFT | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-Base-1B ) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-Base-1B ) |
73
 
74
  ## Dataset
75
 
76
  | **Dataset** | Modelscope | HuggingFace |
77
  |----------------------------|------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|
78
- | SynsQL-Think-916k | [🤖 Modelscope](https://modelscope.cn/datasets/cycloneboy/SynsQL-Think-916k) | [🤗 HuggingFace](https://huggingface.co/datasets/cycloneboy/SynsQL-Think-916k) |
79
- | SynsQL-Merge-Think-310k | [🤖 Modelscope](https://modelscope.cn/datasets/cycloneboy/SynsQL-Merge-Think-310k) | [🤗 HuggingFace](https://huggingface.co/datasets/cycloneboy/SynsQL-Merge-Think-310k) |
80
- | bird train and dev dataset | [🤖 Modelscope](https://modelscope.cn/datasets/cycloneboy/bird_train) | [🤗 HuggingFace](https://huggingface.co/datasets/cycloneboy/bird_train) |
81
 
82
  ## TODO
83
 
 
1
  ---
 
2
  library_name: transformers
3
  license: cc-by-nc-4.0
4
+ pipeline_tag: text-generation
5
  tags:
6
  - text-to-sql
7
  - reinforcement-learning
8
  ---
9
 
 
10
  # SLM-SQL: An Exploration of Small Language Models for Text-to-SQL
11
 
12
  ### Important Links
13
 
14
+ 📖[Arxiv Paper](https://arxiv.org/abs/2507.22478) |
15
+ \ud83d\udcbb[GitHub](https://github.com/CycloneBoy/slm_sql) |
16
+ \ud83e\udd17[HuggingFace](https://huggingface.co/collections/cycloneboy/slm-sql-688b02f99f958d7a417658dc) |
17
+ \ud83e\udd16[ModelScope](https://modelscope.cn/collections/SLM-SQL-624bb6a60e9643) |
18
 
19
  ## News
20
 
 
35
  > effectiveness
36
  > and generalizability of our method, SLM-SQL. On the BIRD development set, the five evaluated models achieved an
37
  > average
38
+ > improvement of 31.4 points. Notably, the 0.5B model reached 56.87% execution accuracy (EX), while the 1.5B model
39
+ > achieved 67.08% EX. We will release our dataset, model, and code to github: https://github.com/CycloneBoy/slm_sql.
40
 
41
  ### Framework
42
 
 
55
 
56
  <img src="https://raw.githubusercontent.com/CycloneBoy/slm_sql/main/data/image/slmsql_ablation_study.png" height="300" alt="slmsql_ablation_study">
57
 
58
+ ## Usage
59
+
60
+ Here's how to use the model for Text-to-SQL generation.
61
+
62
+ ```python
63
+ import torch
64
+ from transformers import AutoTokenizer, AutoModelForCausalLM
65
+
66
+ model_id = "cycloneboy/SLM-SQL-0.5B" # Or choose another model from the table above
67
+
68
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
69
+ model = AutoModelForCausalLM.from_pretrained(
70
+ model_id,
71
+ torch_dtype=torch.bfloat16,
72
+ device_map="auto",
73
+ )
74
+
75
+ # Example query
76
+ question = "How many members are there in the department of 'Sales'?"
77
+
78
+ # The chat template is critical for proper inference as the model is instruction-tuned.
79
+ messages = [
80
+ {"role": "system", "content": "You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer
81
+ "},
82
+ {"role": "user", "content": f"### Instruction:
83
+ Generate a SQL query for the following question:
84
+ {question}
85
+ "},
86
+ ]
87
+
88
+ # Apply the chat template to get the formatted prompt string
89
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
90
+
91
+ # Tokenize and generate
92
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
93
+
94
+ outputs = model.generate(input_ids, max_new_tokens=256, do_sample=True, temperature=0.01, top_p=0.95)
95
+
96
+ # Decode the generated text, skipping special tokens
97
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
98
+
99
+ # Extract only the model's response based on the chat template's structure
100
+ # The response should start after "### Response:
101
+ " and end before "<|EOT|>
102
+ "
103
+ response_start = generated_text.find("### Response:
104
+ ")
105
+ if response_start != -1:
106
+ response_content = generated_text[response_start + len("### Response:
107
+ "):]
108
+ response_end = response_content.find("<|EOT|>")
109
+ if response_end != -1:
110
+ sql_query = response_content[:response_end].strip()
111
+ print(f"Generated SQL: {sql_query}")
112
+ else:
113
+ print(f"Generated text (full): {response_content.strip()}")
114
+ else:
115
+ print(f"Generated text (full): {generated_text.strip()}")
116
+ ```
117
+
118
  ## Model
119
 
120
  | **Model** | Base Model | Train Method | Modelscope | HuggingFace |
121
  |------------------------------------------|------------------------------|--------------|---------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|
122
+ | SLM-SQL-Base-0.5B | Qwen2.5-Coder-0.5B-Instruct | SFT | [\ud83e\udd16 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-Base-0.5B) | [\ud83e\udd17 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-Base-0.5B) |
123
+ | SLM-SQL-0.5B | Qwen2.5-Coder-0.5B-Instruct | SFT + GRPO | [\ud83e\udd16 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-0.5B) | [\ud83e\udd17 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-0.5B) |
124
+ | CscSQL-Merge-Qwen2.5-Coder-0.5B-Instruct | Qwen2.5-Coder-0.5B-Instruct | SFT + GRPO | [\ud83e\udd16 Modelscope](https://modelscope.cn/models/cycloneboy/CscSQL-Merge-Qwen2.5-Coder-0.5B-Instruct) | [\ud83e\udd17 HuggingFace](https://huggingface.co/cycloneboy/CscSQL-Merge-Qwen2.5-Coder-0.5B-Instruct) |
125
+ | SLM-SQL-Base-1.5B | Qwen2.5-Coder-1.5B-Instruct | SFT | [\ud83e\udd16 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-Base-1.5B) | [\ud83e\udd17 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-Base-1.5B) |
126
+ | SLM-SQL-1.5B | Qwen2.5-Coder-1.5B-Instruct | SFT + GRPO | [\ud83e\udd16 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-1.5B) | [\ud83e\udd17 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-1.5B) |
127
+ | CscSQL-Merge-Qwen2.5-Coder-1.5B-Instruct | Qwen2.5-Coder-1.5B-Instruct | SFT + GRPO | [\ud83e\udd16 Modelscope](https://modelscope.cn/models/cycloneboy/CscSQL-Merge-Qwen2.5-Coder-1.5B-Instruct) | [\ud83e\udd17 HuggingFace](https://huggingface.co/cycloneboy/CscSQL-Merge-Qwen2.5-Coder-1.5B-Instruct) |
128
+ | SLM-SQL-Base-0.6B | Qwen3-0.6B | SFT | [\ud83e\udd16 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-Base-0.6B) | [\ud83e\udd17 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-Base-0.6B) |
129
+ | SLM-SQL-0.6B | Qwen3-0.6B | SFT + GRPO | [\ud83e\udd16 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-0.6B) | [\ud83e\udd17 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-0.6B) |
130
+ | SLM-SQL-Base-1.3B | deepseek-coder-1.3b-instruct | SFT | [\ud83e\udd16 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-Base-1.3B ) | [\ud83e\udd17 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-Base-1.3B ) |
131
+ | SLM-SQL-1.3B | deepseek-coder-1.3b-instruct | SFT + GRPO | [\ud83e\udd16 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-1.3B ) | [\ud83e\udd17 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-1.3B ) |
132
+ | SLM-SQL-Base-1B | Llama-3.2-1B-Instruct | SFT | [\ud83e\udd16 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-Base-1B ) | [\ud83e\udd17 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-Base-1B ) |
133
 
134
  ## Dataset
135
 
136
  | **Dataset** | Modelscope | HuggingFace |
137
  |----------------------------|------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|
138
+ | SynsQL-Think-916k | [\ud83e\udd16 Modelscope](https://modelscope.cn/datasets/cycloneboy/SynsQL-Think-916k) | [\ud83e\udd17 HuggingFace](https://huggingface.co/datasets/cycloneboy/SynsQL-Think-916k) |
139
+ | SynsQL-Merge-Think-310k | [\ud83e\udd16 Modelscope](https://modelscope.cn/datasets/cycloneboy/SynsQL-Merge-Think-310k) | [\ud83e\udd17 HuggingFace](https://huggingface.co/datasets/cycloneboy/SynsQL-Merge-Think-310k) |
140
+ | bird train and dev dataset | [\ud83e\udd16 Modelscope](https://modelscope.cn/datasets/cycloneboy/bird_train) | [\ud83e\udd17 HuggingFace](https://huggingface.co/datasets/cycloneboy/bird_train) |
141
 
142
  ## TODO
143