Fine-Tuning Phi-4 with Unsloth

Community Article Published May 15, 2025

This tutorial will guide you through the process of fine-tuning a language model using the Unsloth library. We'll use a pre-trained model and fine-tune it on a custom dataset.

Prerequisites

Before you start, ensure you have the following installed:

Python 3.8 or later PyTorch Unsloth library Hugging Face transformers and datasets libraries You can install the necessary libraries using pip:

pip install torch unsloth transformers datasets

Step 1: Import Required Libraries

First, import the necessary libraries and modules:

from unsloth import FastLanguageModel
import torch
from datasets import Dataset, load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

Step 2: Load the Pre-trained Model

Load a pre-trained model from the Unsloth library. You can choose from a list of supported models:

max_seq_length = 2048
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit",
    max_seq_length=max_seq_length,
    load_in_4bit=load_in_4bit,
)

Step 3: Prepare the Model for Fine-Tuning

Prepare the model for fine-tuning using PEFT (Parameter-Efficient Fine-Tuning):

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Step 4: Load and Prepare the Dataset

Load your dataset and format it for training. In this example, we'll use a dataset with 'instruction', 'input', and 'output' fields:

dataset = load_dataset('aifeifei798/Chinese-DeepSeek-R1-Distill-data-110k-alpaca')

text_data = {'text': []}

for example in dataset:
    input_text = example['input']
    output_text = example['output']
    text_format = f"<|system|>Your name is feifei, an AI math expert developed by DrakIdol.<|end|><|user|>{input_text}<|end|><|assistant|>{output_text}<|end|>"
    text_data['text'].append(text_format)

train_dataset = Dataset.from_dict(text_data)

Step 5: Set Up the Trainer

Set up the SFTTrainer with the necessary training arguments:

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=50,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        save_steps=5,
        save_total_limit=10,
        report_to="none",
    ),
)

Step 6: Train the Model

Train the model using the SFTTrainer:

trainer_stats = trainer.train()

Step 7: Save the Model

Save the fine-tuned model and tokenizer:

model.save_pretrained("drakidol-Phi-4-lora_model")
tokenizer.save_pretrained("drakidol-Phi-4_model")

Step 8: Test the Model

Test the model by generating responses to prompts:

from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(tokenizer, chat_template="phi-4")
FastLanguageModel.for_inference(model)

messages = [{"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")

outputs = model.generate(input_ids=inputs, max_new_tokens=64, use_cache=True, temperature=1.5, min_p=0.1)
tokenizer.batch_decode(outputs)

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt=True)
_ = model.generate(input_ids=inputs, streamer=text_streamer, max_new_tokens=128, use_cache=True, temperature=1.5, min_p=0.1)
This tutorial provides a basic guide to fine-tuning a language model using the Unsloth library. You can customize the dataset, model, and training parameters as needed for your specific use case.

Full Program

from unsloth import FastLanguageModel  # FastVisionModel for LLMs
import torch
max_seq_length = 2048  # Choose any! We auto support RoPE Scaling internally!
load_in_4bit = True  # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",  # Llama-3.1 2x faster
    "unsloth/Mistral-Small-Instruct-2409",  # Mistral 22b 2x faster!
    "unsloth/Phi-4",  # Phi-4 2x faster!
    "unsloth/Phi-4-unsloth-bnb-4bit",  # Phi-4 Unsloth Dynamic 4-bit Quant
    "unsloth/gemma-2-9b-bnb-4bit",  # Gemma 2x faster!
    "unsloth/Qwen2.5-7B-Instruct-bnb-4bit"  # Qwen 2.5 2x faster!
    "unsloth/Llama-3.2-1B-bnb-4bit",  # NEW! Llama 3.2 models
    "unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
    "unsloth/Llama-3.2-3B-bnb-4bit",
    "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
]  # More models at https://docs.unsloth.ai/get-started/all-our-models

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit",
    max_seq_length = max_seq_length,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

#<|system|>Your name is Phi, an AI math expert developed by Microsoft.<|end|><|user|>How to solve 3*x^2+4*x+5=1?<|end|><|assistant|>How to solve 3*x^2+4*x+5=1?<|end|>

from datasets import Dataset, load_from_disk, load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

# Load the dataset
dataset = load_dataset('aifeifei798/Chinese-DeepSeek-R1-Distill-data-110k-alpaca')

# Assuming the dataset has 'instruction', 'input', and 'output' fields
text_data = {
    # 'instruction': [],
    # 'input': [],
    # 'output': [],
    'text': []
}

# Iterate over the dataset and format the data
for example in dataset:  # Iterate directly over the dataset
    instruction = example['instruction']
    input_text = example['input']
    output_text = example['output']

    # Format the text
    text_format = f"<|system|>Your name is feifei, an AI math expert developed by DrakIdol.<|end|><|user|>{input_text}<|end|><|assistant|>{output_text}<|end|>"

    # Append the formatted data
    # text_data['instruction'].append(instruction)
    # text_data['input'].append(input_text)
    # text_data['output'].append(output_text)
    text_data['text'].append(text_format)

# Convert the dictionary to a Dataset object
train_dataset = Dataset.from_dict(text_data)
del text_data
# Print the first 3 entries of the dataset
for i, row in enumerate(train_dataset.select(range(1))):
    print(f"Row {i + 1}:")
    for key in row.keys():
        print(f"{key}: {row[key]}")
    print("\n")

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,  # Can make training 5x faster for short sequences.
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=50,  # Set max steps to 15,000
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        save_steps=5,  # Save the model every 5 steps
        save_total_limit=10,  # Keep only the 10 most recent checkpoints
        report_to="none",  # Use this for WandB etc
        #resume_from_checkpoint=True,  # Resume from the latest checkpoint
        #resume_from_checkpoint=checkpoint_path,  # Resume from the specified checkpoint
    ),
)

trainer_stats = trainer.train()
trainer.model.save_pretrained("drakidol-Phi-4-lora_model")  # Local saving

#model test
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "phi-4",
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

outputs = model.generate(
    input_ids = inputs, max_new_tokens = 64, use_cache = True, temperature = 1.5, min_p = 0.1
)
tokenizer.batch_decode(outputs)

FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(
    input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
    use_cache = True, temperature = 1.5, min_p = 0.1
)

# Save the trained model and tokenizer
model.save_pretrained("drakidol-Phi-4_model")
tokenizer.save_pretrained("drakidol-Phi-4_model")

Possible problems

class GraphModule(torch.nn.Module):
    def forward(self, s0: "Sym(s0)", s1: "Sym(s1)", s2: "Sym(s2)", L_args_1_: "bf16[s0, s1, s2][s1*s2, s2, 1]cuda:0", L_args_2_: "i64[1, s1][s1, 1]cuda:0"):
        l_args_1_ = L_args_1_
        l_args_2_ = L_args_2_

        # No stacktrace found for following nodes
        _set_grad_enabled = torch._C._set_grad_enabled(False);  _set_grad_enabled = None

         # File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:45 in longrope_frequency_update, code: seq_len = torch.max(position_ids) + 1
        max_1: "i64[][]cuda:0" = torch.max(l_args_2_);  l_args_2_ = None
        seq_len: "i64[][]cuda:0" = max_1 + 1;  max_1 = None

         # File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:50 in longrope_frequency_update, code: if seq_len > original_max_position_embeddings:
        gt: "b8[][]cuda:0" = seq_len > 4096;  seq_len = gt = None


class GraphModule(torch.nn.Module):
    def forward(self, s0: "Sym(s0)", s1: "Sym(s1)", s2: "Sym(s2)", L_args_1_: "bf16[s0, s1, s2][s1*s2, s2, 1]cuda:0", L_args_2_: "i64[1, s1][s1, 1]cuda:0"):
        l_args_1_ = L_args_1_
        l_args_2_ = L_args_2_

        # No stacktrace found for following nodes
        _set_grad_enabled = torch._C._set_grad_enabled(False);  _set_grad_enabled = None

         # File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:45 in longrope_frequency_update, code: seq_len = torch.max(position_ids) + 1
        max_1: "i64[][]cuda:0" = torch.max(l_args_2_);  l_args_2_ = None
        seq_len: "i64[][]cuda:0" = max_1 + 1;  max_1 = None

         # File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:50 in longrope_frequency_update, code: if seq_len > original_max_position_embeddings:
        gt: "b8[][]cuda:0" = seq_len > 4096;  seq_len = gt = None


class GraphModule(torch.nn.Module):
    def forward(self, s0: "Sym(s0)", s1: "Sym(s1)", s2: "Sym(s2)", L_args_1_: "bf16[s0, s1, s2][s1*s2, s2, 1]cuda:0", L_args_2_: "i64[1, s1][s1, 1]cuda:0"):
        l_args_1_ = L_args_1_
        l_args_2_ = L_args_2_

        # No stacktrace found for following nodes
        _set_grad_enabled = torch._C._set_grad_enabled(False);  _set_grad_enabled = None

         # File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:45 in longrope_frequency_update, code: seq_len = torch.max(position_ids) + 1
        max_1: "i64[][]cuda:0" = torch.max(l_args_2_);  l_args_2_ = None
        seq_len: "i64[][]cuda:0" = max_1 + 1;  max_1 = None

         # File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:50 in longrope_frequency_update, code: if seq_len > original_max_position_embeddings:
        gt: "b8[][]cuda:0" = seq_len > 4096;  seq_len = gt = None


class GraphModule(torch.nn.Module):
    def forward(self, s0: "Sym(s0)", s1: "Sym(s1)", s2: "Sym(s2)", L_args_1_: "bf16[s0, s1, s2][s1*s2, s2, 1]cuda:0", L_args_2_: "i64[1, s1][s1, 1]cuda:0"):
        l_args_1_ = L_args_1_
        l_args_2_ = L_args_2_

        # No stacktrace found for following nodes
        _set_grad_enabled = torch._C._set_grad_enabled(False);  _set_grad_enabled = None

         # File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:45 in longrope_frequency_update, code: seq_len = torch.max(position_ids) + 1
        max_1: "i64[][]cuda:0" = torch.max(l_args_2_);  l_args_2_ = None
        seq_len: "i64[][]cuda:0" = max_1 + 1;  max_1 = None

         # File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:50 in longrope_frequency_update, code: if seq_len > original_max_position_embeddings:
        gt: "b8[][]cuda:0" = seq_len > 4096;  seq_len = gt = None

Traceback (most recent call last):
  File "/home/ubuntu/model/Phi-4/1.py", line 115, in <module>
    trainer_stats = trainer.train()
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
    return inner_training_loop(
  File "<string>", line 315, in _fast_inner_training_loop
  File "<string>", line 31, in _unsloth_training_step
  File "/home/ubuntu/model/Phi-4/unsloth_compiled_cache/UnslothSFTTrainer.py", line 748, in compute_loss
    outputs = super().compute_loss(
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/unsloth/models/_utils.py", line 1039, in _unsloth_pre_compute_loss
    outputs = self._old_compute_loss(model, inputs, *args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/trainer.py", line 3801, in compute_loss
    outputs = model(**inputs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 814, in forward
    return model_forward(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 802, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/peft/peft_model.py", line 1757, in forward
    return self.base_model(
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 193, in forward
    return self.model.forward(*args, **kwargs)
  File "/home/ubuntu/model/Phi-4/unsloth_compiled_cache/unsloth_compiled_module_phi3.py", line 594, in forward
    return Phi3ForCausalLM_forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, cache_position, logits_to_keep, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/utils/generic.py", line 965, in wrapper
    output = func(self, *args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
  File "/home/ubuntu/model/Phi-4/unsloth_compiled_cache/unsloth_compiled_module_phi3.py", line 417, in Phi3ForCausalLM_forward
    outputs: BaseModelOutputWithPast = self.model(
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/utils/generic.py", line 965, in wrapper
    output = func(self, *args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/models/phi3/modeling_phi3.py", line 567, in forward
    position_embeddings = self.rotary_emb(hidden_states, position_ids)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/model/Phi-4/unsloth_compiled_cache/unsloth_compiled_module_phi3.py", line 357, in forward
    return Phi3RotaryEmbedding_forward(self, x, position_ids)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 659, in _fn
    raise e.with_traceback(None) from None
torch._dynamo.exc.Unsupported: Data-dependent branching
  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.
  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.
  Hint: Use `torch.cond` to express dynamic control flow.

  Developer debug context: attempted to jump with TensorVariable()


from user code:
   File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 70, in inner
    return fn(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py", line 86, in wrapper
    longrope_frequency_update(self, position_ids, device=x.device)
  File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py", line 50, in longrope_frequency_update
    if seq_len > original_max_position_embeddings:

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

Solution

Disabling Dynamo If you suspect that Dynamo is causing the issue, you can try disabling it by setting the environment variable TORCHDYNAMO_DISABLE to 1:

export TORCHDYNAMO_DISABLE=1

Community

Thank you for the tutorial.
How do you verify the correctness of generated responses in Step 8: Test the Model?
There are dataset records with and without the /boxed{}in their output columns. While it can be used to verify some of the generated responses, verification of the others would rather be subjective.

Article author

Thank you for your insightful question. You've raised a crucial point about evaluating generative models, especially when a clear-cut "correct" answer doesn't exist.

You are right that for responses without an objective marker like the \boxed{} tag, evaluation is more subjective. Here are several effective strategies to assess the quality of your model's generated responses:

  • Human Evaluation: This is the gold standard for evaluating generative models. You can create a test set of prompts and have human reviewers score the outputs based on criteria like:

    • Accuracy and Relevance: Does the response correctly answer the user's question?
    • Fluency: Is the generated text grammatically correct and easy to read?
    • Coherence: Do the sentences and ideas flow logically?
    • Persona Consistency: Does the model adhere to its defined persona, such as "feifei, an AI math expert"?
  • LLM-as-a-Judge: You can use a more advanced language model (like GPT-4 or Claude 3) to act as an evaluator. This is a scalable way to get feedback on your model's performance based on the same criteria used in human evaluation.

  • Side-by-Side Comparison: Compare the outputs of your fine-tuned model with the original, pre-trained model. This will help you determine if the fine-tuning process has improved the quality of the responses for a given set of prompts.

  • Reference-Based Metrics (with a word of caution): While metrics like BLEU and ROUGE can be used, they primarily measure how much the model's output overlaps with a reference text. As you noted, this is not always a reliable indicator of a response's quality, especially for creative or conversational tasks.

By combining these methods, you can get a more holistic view of your model's performance. Thank you for thinking so deeply about this, and I hope these suggestions are helpful for your project

Sign up or log in to comment