Improve model card for SLM-SQL: Add GitHub link and usage example

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +55 -21
README.md CHANGED
@@ -1,42 +1,26 @@
1
  ---
2
- pipeline_tag: text-generation
3
  library_name: transformers
4
  license: cc-by-nc-4.0
 
5
  tags:
6
  - text-to-sql
7
  - reinforcement-learning
8
  ---
9
 
10
-
11
  # SLM-SQL: An Exploration of Small Language Models for Text-to-SQL
12
 
13
  ### Important Links
14
 
15
- πŸ“–[Arxiv Paper](https://arxiv.org/abs/2507.22478) |
16
- πŸ€—[HuggingFace](https://huggingface.co/collections/cycloneboy/slm-sql-688b02f99f958d7a417658dc) |
17
- πŸ€–[ModelScope](https://modelscope.cn/collections/SLM-SQL-624bb6a60e9643) |
18
 
19
  ## News
20
 
21
  + `July 31, 2025`: Upload model to modelscope and huggingface.
22
  + `July 30, 2025`: Publish the paper to arxiv
23
 
24
- ## Introduction
25
-
26
- > Large language models (LLMs) have demonstrated strong performance in translating natural language questions into SQL
27
- > queries (Text-to-SQL). In contrast, small language models (SLMs) ranging from 0.5B to 1.5B parameters currently
28
- > underperform on Text-to-SQL tasks due to their limited logical reasoning capabilities. However, SLMs offer inherent
29
- > advantages in inference speed and suitability for edge deployment. To explore their potential in Text-to-SQL
30
- > applications, we leverage recent advancements in post-training techniques. Specifically, we used the open-source
31
- > SynSQL-2.5M dataset to construct two derived datasets: SynSQL-Think-916K for SQL generation and
32
- > SynSQL-Merge-Think-310K
33
- > for SQL merge revision. We then applied supervised fine-tuning and reinforcement learning-based post-training to the
34
- > SLM, followed by inference using a corrective self-consistency approach. Experimental results validate the
35
- > effectiveness
36
- > and generalizability of our method, SLM-SQL. On the BIRD development set, the five evaluated models achieved an
37
- > average
38
- > improvement of 31.4 points. Notably, the 0.5B model reached 56.87\% execution accuracy (EX), while the 1.5B model
39
- > achieved 67.08\% EX. We will release our dataset, model, and code to github: https://github.com/CycloneBoy/slm_sql.
40
 
41
  ### Framework
42
 
@@ -55,6 +39,56 @@ Performance Comparison of different Text-to-SQL methods on BIRD dev and test dat
55
 
56
  <img src="https://raw.githubusercontent.com/CycloneBoy/slm_sql/main/data/image/slmsql_ablation_study.png" height="300" alt="slmsql_ablation_study">
57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  ## Model
59
 
60
  | **Model** | Base Model | Train Method | Modelscope | HuggingFace |
 
1
  ---
 
2
  library_name: transformers
3
  license: cc-by-nc-4.0
4
+ pipeline_tag: text-generation
5
  tags:
6
  - text-to-sql
7
  - reinforcement-learning
8
  ---
9
 
 
10
  # SLM-SQL: An Exploration of Small Language Models for Text-to-SQL
11
 
12
  ### Important Links
13
 
14
+ πŸ“–[Arxiv Paper](https://arxiv.org/abs/2507.22478) | πŸ€—[Hugging Face Paper](https://huggingface.co/papers/2507.22478) | πŸ™[GitHub Repository](https://github.com/CycloneBoy/slm_sql) | πŸ€—[HuggingFace Collection](https://huggingface.co/collections/cycloneboy/slm-sql-688b02f99f958d7a417658dc) | πŸ€–[ModelScope Collection](https://modelscope.cn/collections/SLM-SQL-624bb6a60e9643) |
 
 
15
 
16
  ## News
17
 
18
  + `July 31, 2025`: Upload model to modelscope and huggingface.
19
  + `July 30, 2025`: Publish the paper to arxiv
20
 
21
+ ## Abstract
22
+
23
+ Large language models (LLMs) have demonstrated strong performance in translating natural language questions into SQL queries (Text-to-SQL). In contrast, small language models (SLMs) ranging from 0.5B to 1.5B parameters currently underperform on Text-to-SQL tasks due to their limited logical reasoning capabilities. However, SLMs offer inherent advantages in inference speed and suitability for edge deployment. To explore their potential in Text-to-SQL applications, we leverage recent advancements in post-training techniques. Specifically, we used the open-source SynSQL-2.5M dataset to construct two derived datasets: SynSQL-Think-916K for SQL generation and SynSQL-Merge-Think-310K for SQL merge revision. We then applied supervised fine-tuning and reinforcement learning-based post-training to the SLM, followed by inference using a corrective self-consistency approach. Experimental results validate the effectiveness and generalizability of our method, SLM-SQL. On the BIRD development set, the five evaluated models achieved an average improvement of 31.4 points. Notably, the 0.5B model reached 56.87\% execution accuracy (EX), while the 1.5B model achieved 67.08\% EX.
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  ### Framework
26
 
 
39
 
40
  <img src="https://raw.githubusercontent.com/CycloneBoy/slm_sql/main/data/image/slmsql_ablation_study.png" height="300" alt="slmsql_ablation_study">
41
 
42
+ ## How to Use
43
+
44
+ You can easily use this model with the Hugging Face `transformers` library. Below is a general example for inference:
45
+
46
+ ```python
47
+ from transformers import AutoModelForCausalLM, AutoTokenizer
48
+ import torch
49
+
50
+ # Load the model and tokenizer
51
+ model_name = "cycloneboy/SLM-SQL-1.5B" # Example: You can choose other models from the table below
52
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
53
+ model = AutoModelForCausalLM.from_pretrained(
54
+ model_name,
55
+ torch_dtype=torch.bfloat16, # or torch.float16, adjust based on your GPU
56
+ device_map="auto" # Automatically map model to available devices
57
+ )
58
+ model.eval()
59
+
60
+ # Example prompt for Text-to-SQL
61
+ # Replace this with your natural language query for a specific database schema
62
+ prompt = """
63
+ [Instruction]: Given the following database schema, generate a SQL query that answers the question.
64
+ [Schema]:
65
+ CREATE TABLE Student (StuID INT, Name TEXT, Age INT, Sex TEXT, Major TEXT, Advisor INT, Graduated BOOL);
66
+ CREATE TABLE Course (CrsID INT, Title TEXT, Dept TEXT, Credits INT);
67
+ CREATE TABLE Enrollment (StuID INT, CrsID INT, Grade REAL);
68
+ CREATE TABLE Advisor (AdvID INT, Name TEXT, Dept TEXT);
69
+ [Question]: What is the average age of students who are taking 'Database' course?
70
+ """
71
+
72
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
73
+
74
+ # Generate SQL query
75
+ outputs = model.generate(
76
+ **inputs,
77
+ max_new_tokens=256,
78
+ num_beams=1, # Adjust for different decoding strategies
79
+ do_sample=False,
80
+ temperature=0.0,
81
+ top_p=1.0,
82
+ eos_token_id=tokenizer.eos_token_id
83
+ )
84
+
85
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
86
+ print(generated_text)
87
+
88
+ # The output will contain the prompt and the generated SQL.
89
+ # You might need to parse the generated_text to extract only the SQL query.
90
+ ```
91
+
92
  ## Model
93
 
94
  | **Model** | Base Model | Train Method | Modelscope | HuggingFace |