Fine-Tuning Phi-4 with Unsloth

Community Article Published May 15, 2025

This tutorial will guide you through the process of fine-tuning a language model using the Unsloth library. We'll use a pre-trained model and fine-tune it on a custom dataset.

Prerequisites

Before you start, ensure you have the following installed:

Python 3.8 or later PyTorch Unsloth library Hugging Face transformers and datasets libraries You can install the necessary libraries using pip:

pip install torch unsloth transformers datasets

Step 1: Import Required Libraries

First, import the necessary libraries and modules:

from unsloth import FastLanguageModel
import torch
from datasets import Dataset, load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

Step 2: Load the Pre-trained Model

Load a pre-trained model from the Unsloth library. You can choose from a list of supported models:

max_seq_length = 2048
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit",
    max_seq_length=max_seq_length,
    load_in_4bit=load_in_4bit,
)

Step 3: Prepare the Model for Fine-Tuning

Prepare the model for fine-tuning using PEFT (Parameter-Efficient Fine-Tuning):

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Step 4: Load and Prepare the Dataset

Load your dataset and format it for training. In this example, we'll use a dataset with 'instruction', 'input', and 'output' fields:

dataset = load_dataset('aifeifei798/Chinese-DeepSeek-R1-Distill-data-110k-alpaca')

text_data = {'text': []}

for example in dataset:
    input_text = example['input']
    output_text = example['output']
    text_format = f"<|system|>Your name is feifei, an AI math expert developed by DrakIdol.<|end|><|user|>{input_text}<|end|><|assistant|>{output_text}<|end|>"
    text_data['text'].append(text_format)

train_dataset = Dataset.from_dict(text_data)

Step 5: Set Up the Trainer

Set up the SFTTrainer with the necessary training arguments:

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=50,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        save_steps=5,
        save_total_limit=10,
        report_to="none",
    ),
)

Step 6: Train the Model

Train the model using the SFTTrainer:

trainer_stats = trainer.train()

Step 7: Save the Model

Save the fine-tuned model and tokenizer:

model.save_pretrained("drakidol-Phi-4-lora_model")
tokenizer.save_pretrained("drakidol-Phi-4_model")

Step 8: Test the Model

Test the model by generating responses to prompts:

from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(tokenizer, chat_template="phi-4")
FastLanguageModel.for_inference(model)

messages = [{"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")

outputs = model.generate(input_ids=inputs, max_new_tokens=64, use_cache=True, temperature=1.5, min_p=0.1)
tokenizer.batch_decode(outputs)

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt=True)
_ = model.generate(input_ids=inputs, streamer=text_streamer, max_new_tokens=128, use_cache=True, temperature=1.5, min_p=0.1)
This tutorial provides a basic guide to fine-tuning a language model using the Unsloth library. You can customize the dataset, model, and training parameters as needed for your specific use case.

Full Program

from unsloth import FastLanguageModel  # FastVisionModel for LLMs
import torch
max_seq_length = 2048  # Choose any! We auto support RoPE Scaling internally!
load_in_4bit = True  # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",  # Llama-3.1 2x faster
    "unsloth/Mistral-Small-Instruct-2409",  # Mistral 22b 2x faster!
    "unsloth/Phi-4",  # Phi-4 2x faster!
    "unsloth/Phi-4-unsloth-bnb-4bit",  # Phi-4 Unsloth Dynamic 4-bit Quant
    "unsloth/gemma-2-9b-bnb-4bit",  # Gemma 2x faster!
    "unsloth/Qwen2.5-7B-Instruct-bnb-4bit"  # Qwen 2.5 2x faster!
    "unsloth/Llama-3.2-1B-bnb-4bit",  # NEW! Llama 3.2 models
    "unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
    "unsloth/Llama-3.2-3B-bnb-4bit",
    "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
]  # More models at https://docs.unsloth.ai/get-started/all-our-models

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit",
    max_seq_length = max_seq_length,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

#<|system|>Your name is Phi, an AI math expert developed by Microsoft.<|end|><|user|>How to solve 3*x^2+4*x+5=1?<|end|><|assistant|>How to solve 3*x^2+4*x+5=1?<|end|>

from datasets import Dataset, load_from_disk, load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

# Load the dataset
dataset = load_dataset('aifeifei798/Chinese-DeepSeek-R1-Distill-data-110k-alpaca')

# Assuming the dataset has 'instruction', 'input', and 'output' fields
text_data = {
    # 'instruction': [],
    # 'input': [],
    # 'output': [],
    'text': []
}

# Iterate over the dataset and format the data
for example in dataset:  # Iterate directly over the dataset
    instruction = example['instruction']
    input_text = example['input']
    output_text = example['output']

    # Format the text
    text_format = f"<|system|>Your name is feifei, an AI math expert developed by DrakIdol.<|end|><|user|>{input_text}<|end|><|assistant|>{output_text}<|end|>"

    # Append the formatted data
    # text_data['instruction'].append(instruction)
    # text_data['input'].append(input_text)
    # text_data['output'].append(output_text)
    text_data['text'].append(text_format)

# Convert the dictionary to a Dataset object
train_dataset = Dataset.from_dict(text_data)
del text_data
# Print the first 3 entries of the dataset
for i, row in enumerate(train_dataset.select(range(1))):
    print(f"Row {i + 1}:")
    for key in row.keys():
        print(f"{key}: {row[key]}")
    print("\n")

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,  # Can make training 5x faster for short sequences.
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=50,  # Set max steps to 15,000
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        save_steps=5,  # Save the model every 5 steps
        save_total_limit=10,  # Keep only the 10 most recent checkpoints
        report_to="none",  # Use this for WandB etc
        #resume_from_checkpoint=True,  # Resume from the latest checkpoint
        #resume_from_checkpoint=checkpoint_path,  # Resume from the specified checkpoint
    ),
)

trainer_stats = trainer.train()
trainer.model.save_pretrained("drakidol-Phi-4-lora_model")  # Local saving

#model test
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "phi-4",
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

outputs = model.generate(
    input_ids = inputs, max_new_tokens = 64, use_cache = True, temperature = 1.5, min_p = 0.1
)
tokenizer.batch_decode(outputs)

FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(
    input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
    use_cache = True, temperature = 1.5, min_p = 0.1
)

# Save the trained model and tokenizer
model.save_pretrained("drakidol-Phi-4_model")
tokenizer.save_pretrained("drakidol-Phi-4_model")

Possible problems

class GraphModule(torch.nn.Module):
    def forward(self, s0: "Sym(s0)", s1: "Sym(s1)", s2: "Sym(s2)", L_args_1_: "bf16[s0, s1, s2][s1*s2, s2, 1]cuda:0", L_args_2_: "i64[1, s1][s1, 1]cuda:0"):
        l_args_1_ = L_args_1_
        l_args_2_ = L_args_2_

        # No stacktrace found for following nodes
        _set_grad_enabled = torch._C._set_grad_enabled(False);  _set_grad_enabled = None

         # File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:45 in longrope_frequency_update, code: seq_len = torch.max(position_ids) + 1
        max_1: "i64[][]cuda:0" = torch.max(l_args_2_);  l_args_2_ = None
        seq_len: "i64[][]cuda:0" = max_1 + 1;  max_1 = None

         # File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:50 in longrope_frequency_update, code: if seq_len > original_max_position_embeddings:
        gt: "b8[][]cuda:0" = seq_len > 4096;  seq_len = gt = None


class GraphModule(torch.nn.Module):
    def forward(self, s0: "Sym(s0)", s1: "Sym(s1)", s2: "Sym(s2)", L_args_1_: "bf16[s0, s1, s2][s1*s2, s2, 1]cuda:0", L_args_2_: "i64[1, s1][s1, 1]cuda:0"):
        l_args_1_ = L_args_1_
        l_args_2_ = L_args_2_

        # No stacktrace found for following nodes
        _set_grad_enabled = torch._C._set_grad_enabled(False);  _set_grad_enabled = None

         # File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:45 in longrope_frequency_update, code: seq_len = torch.max(position_ids) + 1
        max_1: "i64[][]cuda:0" = torch.max(l_args_2_);  l_args_2_ = None
        seq_len: "i64[][]cuda:0" = max_1 + 1;  max_1 = None

         # File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:50 in longrope_frequency_update, code: if seq_len > original_max_position_embeddings:
        gt: "b8[][]cuda:0" = seq_len > 4096;  seq_len = gt = None


class GraphModule(torch.nn.Module):
    def forward(self, s0: "Sym(s0)", s1: "Sym(s1)", s2: "Sym(s2)", L_args_1_: "bf16[s0, s1, s2][s1*s2, s2, 1]cuda:0", L_args_2_: "i64[1, s1][s1, 1]cuda:0"):
        l_args_1_ = L_args_1_
        l_args_2_ = L_args_2_

        # No stacktrace found for following nodes
        _set_grad_enabled = torch._C._set_grad_enabled(False);  _set_grad_enabled = None

         # File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:45 in longrope_frequency_update, code: seq_len = torch.max(position_ids) + 1
        max_1: "i64[][]cuda:0" = torch.max(l_args_2_);  l_args_2_ = None
        seq_len: "i64[][]cuda:0" = max_1 + 1;  max_1 = None

         # File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:50 in longrope_frequency_update, code: if seq_len > original_max_position_embeddings:
        gt: "b8[][]cuda:0" = seq_len > 4096;  seq_len = gt = None


class GraphModule(torch.nn.Module):
    def forward(self, s0: "Sym(s0)", s1: "Sym(s1)", s2: "Sym(s2)", L_args_1_: "bf16[s0, s1, s2][s1*s2, s2, 1]cuda:0", L_args_2_: "i64[1, s1][s1, 1]cuda:0"):
        l_args_1_ = L_args_1_
        l_args_2_ = L_args_2_

        # No stacktrace found for following nodes
        _set_grad_enabled = torch._C._set_grad_enabled(False);  _set_grad_enabled = None

         # File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:45 in longrope_frequency_update, code: seq_len = torch.max(position_ids) + 1
        max_1: "i64[][]cuda:0" = torch.max(l_args_2_);  l_args_2_ = None
        seq_len: "i64[][]cuda:0" = max_1 + 1;  max_1 = None

         # File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:50 in longrope_frequency_update, code: if seq_len > original_max_position_embeddings:
        gt: "b8[][]cuda:0" = seq_len > 4096;  seq_len = gt = None

Traceback (most recent call last):
  File "/home/ubuntu/model/Phi-4/1.py", line 115, in <module>
    trainer_stats = trainer.train()
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
    return inner_training_loop(
  File "<string>", line 315, in _fast_inner_training_loop
  File "<string>", line 31, in _unsloth_training_step
  File "/home/ubuntu/model/Phi-4/unsloth_compiled_cache/UnslothSFTTrainer.py", line 748, in compute_loss
    outputs = super().compute_loss(
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/unsloth/models/_utils.py", line 1039, in _unsloth_pre_compute_loss
    outputs = self._old_compute_loss(model, inputs, *args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/trainer.py", line 3801, in compute_loss
    outputs = model(**inputs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 814, in forward
    return model_forward(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 802, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/peft/peft_model.py", line 1757, in forward
    return self.base_model(
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 193, in forward
    return self.model.forward(*args, **kwargs)
  File "/home/ubuntu/model/Phi-4/unsloth_compiled_cache/unsloth_compiled_module_phi3.py", line 594, in forward
    return Phi3ForCausalLM_forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, cache_position, logits_to_keep, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/utils/generic.py", line 965, in wrapper
    output = func(self, *args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
  File "/home/ubuntu/model/Phi-4/unsloth_compiled_cache/unsloth_compiled_module_phi3.py", line 417, in Phi3ForCausalLM_forward
    outputs: BaseModelOutputWithPast = self.model(
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/utils/generic.py", line 965, in wrapper
    output = func(self, *args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/models/phi3/modeling_phi3.py", line 567, in forward
    position_embeddings = self.rotary_emb(hidden_states, position_ids)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/model/Phi-4/unsloth_compiled_cache/unsloth_compiled_module_phi3.py", line 357, in forward
    return Phi3RotaryEmbedding_forward(self, x, position_ids)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 659, in _fn
    raise e.with_traceback(None) from None
torch._dynamo.exc.Unsupported: Data-dependent branching
  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.
  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.
  Hint: Use `torch.cond` to express dynamic control flow.

  Developer debug context: attempted to jump with TensorVariable()


from user code:
   File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 70, in inner
    return fn(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py", line 86, in wrapper
    longrope_frequency_update(self, position_ids, device=x.device)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py", line 50, in longrope_frequency_update
    if seq_len > original_max_position_embeddings:

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

Solution

Disabling Dynamo If you suspect that Dynamo is causing the issue, you can try disabling it by setting the environment variable TORCHDYNAMO_DISABLE to 1:

export TORCHDYNAMO_DISABLE=1

Community

wypieprz

Jul 4

Thank you for the tutorial.
How do you verify the correctness of generated responses in Step 8: Test the Model?
There are dataset records with and without the /boxed{}in their output columns. While it can be used to verify some of the generated responses, verification of the others would rather be subjective.

aifeifei798

Article author Jul 4

Thank you for your insightful question. You've raised a crucial point about evaluating generative models, especially when a clear-cut "correct" answer doesn't exist.

You are right that for responses without an objective marker like the \boxed{} tag, evaluation is more subjective. Here are several effective strategies to assess the quality of your model's generated responses:

Human Evaluation: This is the gold standard for evaluating generative models. You can create a test set of prompts and have human reviewers score the outputs based on criteria like:
- Accuracy and Relevance: Does the response correctly answer the user's question?
- Fluency: Is the generated text grammatically correct and easy to read?
- Coherence: Do the sentences and ideas flow logically?
- Persona Consistency: Does the model adhere to its defined persona, such as "feifei, an AI math expert"?
LLM-as-a-Judge: You can use a more advanced language model (like GPT-4 or Claude 3) to act as an evaluator. This is a scalable way to get feedback on your model's performance based on the same criteria used in human evaluation.
Side-by-Side Comparison: Compare the outputs of your fine-tuned model with the original, pre-trained model. This will help you determine if the fine-tuning process has improved the quality of the responses for a given set of prompts.
Reference-Based Metrics (with a word of caution): While metrics like BLEU and ROUGE can be used, they primarily measure how much the model's output overlaps with a reference text. As you noted, this is not always a reliable indicator of a response's quality, especially for creative or conversational tasks.

By combining these methods, you can get a more holistic view of your model's performance. Thank you for thinking so deeply about this, and I hope these suggestions are helpful for your project

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote