Llama 3.2 1B Book Triage - GGUF Quantized

GGUF quantized versions of the Llama 3.2 1B Book Triage model for efficient CPU inference.

Model Description

This repository contains GGUF quantized versions of the fine-tuned Llama 3.2 1B model for rare book triage classification.

Source Model: ambrosfitz/llama-3.2-1b-book-triage

Available Quantizations

File	Size	Description	Use Case
`model-q4_k_m.gguf`	770 MB	Recommended - Best balance	General CPU inference
`model-q5_k_m.gguf`	869 MB	Better quality	When quality matters more
`model-q8_0.gguf`	1260 MB	Highest quality	Maximum accuracy
`model-f16.gguf`	2365 MB	Full precision	Benchmarking

Usage

With llama.cpp

# Download a quantized model
wget https://huggingface.co/ambrosfitz/llama-3.2-1b-book-triage-gguf/resolve/main/model-q4_k_m.gguf

# Run inference
./llama-cli -m model-q4_k_m.gguf -p "Your prompt here"

With Python (llama-cpp-python)

from llama_cpp import Llama

# Load model
llm = Llama(
    model_path="model-q4_k_m.gguf",
    n_ctx=2048,
    n_threads=8
)

# Generate
output = llm("Your prompt here", max_tokens=200)
print(output['choices'][0]['text'])

With Python (ctransformers)

from ctransformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "ambrosfitz/llama-3.2-1b-book-triage-gguf",
    model_file="model-q4_k_m.gguf",
    model_type="llama"
)

response = model("Your prompt here")

Performance

Approximate inference speeds on CPU (AMD Ryzen 5/Intel i5):

Quantization	Tokens/sec	RAM Usage
Q4_K_M	15-25	~1.5 GB
Q5_K_M	12-20	~2 GB
Q8_0	8-15	~3 GB

Model Task

This model triages rare books into 4 categories:

ELIMINATE: Not worth preserving
LOW_INTEREST: Low priority
PROMISING: Worth investigating
HIGH_INTEREST: Top preservation priority

Prompt Format

Triage this book (FAST decision):

Title: [Book Title]
Author: [Author Name]
Publisher: [Publisher]
Year: [Year]
Holdings: [N] libraries
Tier: [1-3]

Quick triage decision (JSON only):

Expected Output

{
  "category": "PROMISING",
  "score": 70,
  "is_thesis": false,
  "is_gov_doc": false,
  "reason": "Older book with limited holdings, potentially rare and unique."
}

Citation

@misc{book-triage-gguf,
  author = {ambrosfitz},
  title = {Llama 3.2 1B Book Triage - GGUF Quantized},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/ambrosfitz/llama-3.2-1b-book-triage-gguf}
}

Original Model

Fine-tuned from: unsloth/Llama-3.2-1B-Instruct

Merged 16-bit version: ambrosfitz/llama-3.2-1b-book-triage

License

Apache 2.0 (same as base Llama 3.2 model)

Quantization Details

Tool: llama.cpp
Source: 16-bit merged model
Formats: Q4_K_M, Q5_K_M, Q8_0, F16
Date: 2025-01-13

Downloads last month: 55

GGUF

Model size

1B params

Architecture

llama

Hardware compatibility

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ambrosfitz/llama-3.2-1b-book-triage-gguf

Base model

ambrosfitz/llama-3.2-1b-book-triage

Quantized

(1)

this model