Gemma-3-1B-IT BitsAndBytesConfig NF4 Quantized
This model is a quantized version of google/gemma-3-1b-it-qat-int4-unquantized
using BitsAndBytesConfig with NF4 quantization.
Model Details
- Base Model: google/gemma-3-1b-it-qat-int4-unquantized
- Quantization: BitsAndBytesConfig NF4 (4-bit)
- Quantization Type: NF4 with double quantization
- Compute Dtype: bfloat16
- Storage Dtype: uint8
Quantization Configuration
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_quant_storage=torch.uint8
)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load the quantized model
model = AutoModelForCausalLM.from_pretrained(
"WaveCut/gemma-3-1b-it-qat-int4-bnb-nf4",
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("WaveCut/gemma-3-1b-it-qat-int4-bnb-nf4")
# Generate text
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Benefits
- Reduced Memory Usage: ~75% reduction in memory footprint compared to full precision
- Faster Inference: Optimized for inference speed
- Maintained Quality: NF4 quantization preserves model quality effectively
Hardware Requirements
- GPU Memory: ~3-4GB VRAM (vs ~12GB for FP16)
- CUDA Compatible: Requires CUDA-capable GPU for optimal performance
- CPU Fallback: Can run on CPU with reduced performance
Quantization Details
This model uses BitsAndBytesConfig for 4-bit quantization:
- NF4 (Normal Float 4) quantization for optimal quality/size trade-off
- Double quantization for additional compression
- Mixed precision with bfloat16 compute dtype
License
This model inherits the Apache 2.0 license from the base model.
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
馃檵
Ask for provider support
Model tree for WaveCut/gemma-3-1b-it-qat-int4-bnb-nf4
Base model
google/gemma-3-1b-pt
Finetuned
google/gemma-3-1b-it