Llama 3.2 1B Book Triage - GGUF Quantized
GGUF quantized versions of the Llama 3.2 1B Book Triage model for efficient CPU inference.
Model Description
This repository contains GGUF quantized versions of the fine-tuned Llama 3.2 1B model for rare book triage classification.
Source Model: ambrosfitz/llama-3.2-1b-book-triage
Available Quantizations
| File | Size | Description | Use Case |
|---|---|---|---|
model-q4_k_m.gguf |
770 MB | Recommended - Best balance | General CPU inference |
model-q5_k_m.gguf |
869 MB | Better quality | When quality matters more |
model-q8_0.gguf |
1260 MB | Highest quality | Maximum accuracy |
model-f16.gguf |
2365 MB | Full precision | Benchmarking |
Usage
With llama.cpp
# Download a quantized model
wget https://huggingface.co/ambrosfitz/llama-3.2-1b-book-triage-gguf/resolve/main/model-q4_k_m.gguf
# Run inference
./llama-cli -m model-q4_k_m.gguf -p "Your prompt here"
With Python (llama-cpp-python)
from llama_cpp import Llama
# Load model
llm = Llama(
model_path="model-q4_k_m.gguf",
n_ctx=2048,
n_threads=8
)
# Generate
output = llm("Your prompt here", max_tokens=200)
print(output['choices'][0]['text'])
With Python (ctransformers)
from ctransformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"ambrosfitz/llama-3.2-1b-book-triage-gguf",
model_file="model-q4_k_m.gguf",
model_type="llama"
)
response = model("Your prompt here")
Performance
Approximate inference speeds on CPU (AMD Ryzen 5/Intel i5):
| Quantization | Tokens/sec | RAM Usage |
|---|---|---|
| Q4_K_M | 15-25 | ~1.5 GB |
| Q5_K_M | 12-20 | ~2 GB |
| Q8_0 | 8-15 | ~3 GB |
Model Task
This model triages rare books into 4 categories:
- ELIMINATE: Not worth preserving
- LOW_INTEREST: Low priority
- PROMISING: Worth investigating
- HIGH_INTEREST: Top preservation priority
Prompt Format
Triage this book (FAST decision):
Title: [Book Title]
Author: [Author Name]
Publisher: [Publisher]
Year: [Year]
Holdings: [N] libraries
Tier: [1-3]
Quick triage decision (JSON only):
Expected Output
{
"category": "PROMISING",
"score": 70,
"is_thesis": false,
"is_gov_doc": false,
"reason": "Older book with limited holdings, potentially rare and unique."
}
Citation
@misc{book-triage-gguf,
author = {ambrosfitz},
title = {Llama 3.2 1B Book Triage - GGUF Quantized},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/ambrosfitz/llama-3.2-1b-book-triage-gguf}
}
Original Model
Fine-tuned from: unsloth/Llama-3.2-1B-Instruct
Merged 16-bit version: ambrosfitz/llama-3.2-1b-book-triage
License
Apache 2.0 (same as base Llama 3.2 model)
Quantization Details
- Tool: llama.cpp
- Source: 16-bit merged model
- Formats: Q4_K_M, Q5_K_M, Q8_0, F16
- Date: 2025-01-13
- Downloads last month
- 55
Hardware compatibility
Log In
to view the estimation
4-bit
5-bit
8-bit
16-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for ambrosfitz/llama-3.2-1b-book-triage-gguf
Base model
ambrosfitz/llama-3.2-1b-book-triage