gemma4-4b-sci
Early-stage research experiment. Trained for 600 steps on 30K examples. Expect hallucinations and factual errors.
gemma4-4b-sci is a scientific-domain fine-tune of Gemma 4 E4B via QLoRA on 30,000 examples from OpenSciLM/OS_Train_Data and SciRIFF. Inspired by OpenScholar — this is a generation-only model without a retrieval pipeline.
Model Description
- Developed by: Michele Banfi
- Base model:
unsloth/gemma-4-E4B-it - Method: QLoRA (4-bit) + SFT via Unsloth, language layers only (vision encoder frozen)
- Training: 600 steps, 30K examples (15K OS_Train_Data + 15K SciRIFF), NVIDIA RTX 5090
- License: Gemma Terms of Use
Model Sources
- Repository: https://github.com/michelebanfi/gemma-4-finetuning
- Evaluation: ScholarQABench
- Ollama:
ollama run hf.co/linosium/gemma4-4b-sci
Quick Start
ollama run hf.co/linosium/gemma4-4b-sci
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained("linosium/gemma4-4b-sci", torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("linosium/gemma4-4b-sci")
messages = [{"role": "user", "content": "Explain the role of CRISPR-Cas9 in gene editing."}]
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
print(tokenizer.decode(model.generate(input_ids, max_new_tokens=512)[0][input_ids.shape[1]:], skip_special_tokens=True))
Evaluation
ScholarQABench — draft results, 600-step run. Tier 1 uses gold paper contexts (fair comparison). Tier 2 has no retrieval, so citation scores are 0 by design.
Tier 1 — single-paper tasks
| Task | Metric | gemma4-4b-sci | OpenScholar-8B |
|---|---|---|---|
| SciFact (208) | Accuracy | 77.9% | 76.4% |
| PubMedQA (843) | Accuracy | 81.5% | 76.0% |
| QASA (1375) | ROUGE-L | 20.9 | 23.0 |
| SciFact | Citation F1 | 0.0 | 68.9 |
| PubMedQA | Citation F1 | 0.0 | 43.6 |
| QASA | Citation F1 | 4.3 | 56.3 |
Correctness matches or exceeds OpenScholar-8B (2× the parameters) at 600 steps. Citation gap is entirely due to the missing retrieval pipeline.
Tier 2 — synthesis tasks (no retrieval)
| Task | Citation F1 | OpenScholar-8B |
|---|---|---|
| ScholarQA-CS (110) | 0.0 | 47.9 |
| ScholarQA-Bio (1451) | 0.0 | 42.8 |
| ScholarQA-Neuro (1308) | 0.0 | 50.8 |
Citation
@article{asai2024openscholar,
title = {OpenScholar: Synthesizing Scientific Literature with Retrieval-Augmented LMs},
author = {Asai, Akari and others},
journal = {Nature},
year = {2024},
url = {https://allenai.org/blog/nature-openscilm}
}
- Downloads last month
- 2,428
We're not able to determine the quantization variants.