Ellora
Collection
Ellora: Enhancing LLMs with LoRA - Standardized Recipes for Capability Enhancement
β’
10 items
β’
Updated
β’
1
This LoRA adapter enhances google/gemma-3-1b-it with structured reasoning capabilities using <think></think>
tags. Trained with GRPO (Group Relative Policy Optimization) on self-generated preference data.
<think></think>
tags for chain-of-thought reasoningfrom transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"google/gemma-3-1b-it",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-1b-it")
# Load reasoning LoRA adapter
model = PeftModel.from_pretrained(model, "codelion/gemma-3-1b-it-reasoning-grpo-lora")
# Use with thinking prompt
prompt = '''Think step by step and use <think></think> tags to show your reasoning process.
Problem: If a train travels 120 miles in 2 hours, then increases its speed by 30 mph for the next hour, how many total miles does it travel?
Response:'''
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.2)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
The model will generate responses with structured thinking:
<think>
First, I need to find the train's initial speed.
Speed = Distance / Time = 120 miles / 2 hours = 60 mph
For the first 2 hours: 120 miles
For the next hour, speed increases by 30 mph: 60 + 30 = 90 mph
Distance in third hour: 90 mph Γ 1 hour = 90 miles
Total distance = 120 + 90 = 210 miles
</think>
To solve this step by step:
First, I'll find the train's initial speed:
- Distance = 120 miles, Time = 2 hours
- Speed = 120 Γ· 2 = 60 mph
Next, I'll calculate the distance for each segment:
- First 2 hours: 120 miles (given)
- Third hour: speed increases by 30 mph β 60 + 30 = 90 mph
- Distance in third hour: 90 Γ 1 = 90 miles
Total distance = 120 + 90 = 210 miles
The model was trained on self-generated reasoning problems across multiple domains:
The adapter was evaluated on diverse reasoning tasks:
This adapter is part of the Ellora project - standardized recipes for enhancing LLM capabilities.