# Fine-tune DeepSeek with a Synthetic Reasoning Dataset

This notebook demonstrates the fine-tuning process of `unsloth/DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit` using a synthetic reasoning dataset.

It provides a complete walkthrough of the fine-tuning process after generating synthetic data using the Synthetic Data Generator. For a comprehensive explanation of the methodology and additional details, refer to the blog post: [Fine-tune DeepSeek with a Synthetic Reasoning Dataset](https://huggingface.co/blog/sdiazlor/fine-tune-deepseek-with-a-synthetic-reasoning-data).

## Getting Started

### Install the Dependencies

In [None]:
!pip install datasets
!pip install unsloth

### Import the Required Libraries

In [None]:
import torch
from datasets import load_dataset
from transformers import TrainingArguments
from trl import SFTTrainer
from unsloth import is_bfloat16_supported, FastLanguageModel

### Configure the Environment

In [None]:
MODEL = "unsloth/DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit"
REPO_NAME = "sdiazlor" # your HF username here
MODEL_NAME = "deepseek-r1-distill-qwen-1.5-unsloth-sft-python"

## Load the Model and Tokenizer

In [None]:
# Load the 4bit pre quantized model of deepseek and the tokenizer

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = MODEL,
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

In [None]:
# We add the LORA adapters to the model
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Unsloth 2025.2.5 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


## Pre-process the Synthetic Data

In [None]:
# Prepare the dataset

prompt_style = """Below is an instruction that describes a task, paired with a question that provides further context.
Write a response that appropriately answer the question.
Before answering, think carefully but concisely about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are an expert programmer with advanced knowledge in Python. Your task is to provide concise and easy-to-understand solutions. Please answer the following python question.

### Question:
{}

### Response:
{}
"""

EOS_TOKEN = tokenizer.eos_token


def formatting_prompts_func(examples):
    prompts = examples["prompt"]
    completions = examples["completion"]
    texts = []
    for prompt,completion in zip(prompts, completions):
        text = prompt_style.format(prompt, completion) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }


dataset = load_dataset("sdiazlor/python-reasoning-dataset", split="train")
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]

'Below is an instruction that describes a task, paired with a question that provides further context. \nWrite a response that appropriately answer the question. \nBefore answering, think carefully but concisely about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are an expert programmer with advanced knowledge in Python. Your task is to provide concise and easy-to-understand solutions. Please answer the following python question. \n\n### Question:\nHow can I solve a Rubik\'s Cube? \n\n\n\n### Response:\nOkay, so I want to solve a Rubik\'s Cube, but I\'m not really sure where to start. I remember seeing people solve them quickly, but it looks so complicated. Let me try to break it down.\n\nFirst, I think I need to understand the structure of the cube. It has six faces, each with nine smaller squares of different colors. The goal is to have each face all one color. But how do the pieces move?\n\nI\'ve heard ab

## Train the Model

In [None]:
# Configure the trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=2048,
    dataset_num_proc=2,
    packing=False,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        num_train_epochs=3,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        report_to="none",
    ),
)

In [None]:
# Train the model
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 500 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 186
 "-____-"     Number of trainable parameters = 18,464,768


Step,Training Loss
1,0.8284
2,0.7844
3,0.8388
4,0.8495
5,0.7323
6,0.6701
7,0.7099
8,0.6885
9,0.6136
10,0.6265


In [None]:
# Save to the local directory and push it to the Hub
model.save_pretrained(MODEL_NAME)
tokenizer.save_pretrained(MODEL_NAME)
model.save_pretrained_merged(MODEL_NAME, tokenizer, save_method="merged_16bit")

fine_tuned_model = f"{REPO_NAME}/{MODEL_NAME}"
model.push_to_hub(fine_tuned_model, safe_serialization=None)
tokenizer.push_to_hub(fine_tuned_model, safe_serialization=None)
model.push_to_hub_merged(fine_tuned_model, tokenizer, save_method="merged_16bit") # for vLLM
model.push_to_hub_gguf(
    f"{fine_tuned_model}_q4_k_m", tokenizer, quantization_method="q4_k_m"
) # as gguf

## Run Inference

In [None]:
# Run inference
question = "How can I get the prime numbers from 0 to 125?"

FastLanguageModel.for_inference(model)
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=2048,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])


Okay, so I need to find all the prime numbers between 0 and 125. Hmm, primes are numbers greater than 1 that have no divisors other than 1 and themselves. So, first, I should probably start by listing all numbers from 0 to 125 and then eliminate the non-primes.

Wait, but 0 and 1 aren't primes. So I can ignore them. The smallest prime is 2. So maybe I should start checking from 2 onwards.

I remember that one method to check for primes is the Sieve of Eratosthenes. That's an efficient algorithm for finding all primes up to a certain limit. Let me think about how that works. The idea is to create a list of all numbers up to the limit and then iteratively mark the multiples of each prime starting from 2. The numbers that remain unmarked are primes.

So, applying this to the range 0-125. I'll create a list of booleans from 0 to 125, initializing them all to True except index 0 and 1 which are False. Then, for each number starting from 2, if it's still marked as prime, I'll mark all its mu