Llama 3.2 1B Book Triage - GGUF Quantized

GGUF quantized versions of the Llama 3.2 1B Book Triage model for efficient CPU inference.

Model Description

This repository contains GGUF quantized versions of the fine-tuned Llama 3.2 1B model for rare book triage classification.

Source Model: ambrosfitz/llama-3.2-1b-book-triage

Available Quantizations

File Size Description Use Case
model-q4_k_m.gguf 770 MB Recommended - Best balance General CPU inference
model-q5_k_m.gguf 869 MB Better quality When quality matters more
model-q8_0.gguf 1260 MB Highest quality Maximum accuracy
model-f16.gguf 2365 MB Full precision Benchmarking

Usage

With llama.cpp

# Download a quantized model
wget https://huggingface.co/ambrosfitz/llama-3.2-1b-book-triage-gguf/resolve/main/model-q4_k_m.gguf

# Run inference
./llama-cli -m model-q4_k_m.gguf -p "Your prompt here"

With Python (llama-cpp-python)

from llama_cpp import Llama

# Load model
llm = Llama(
    model_path="model-q4_k_m.gguf",
    n_ctx=2048,
    n_threads=8
)

# Generate
output = llm("Your prompt here", max_tokens=200)
print(output['choices'][0]['text'])

With Python (ctransformers)

from ctransformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "ambrosfitz/llama-3.2-1b-book-triage-gguf",
    model_file="model-q4_k_m.gguf",
    model_type="llama"
)

response = model("Your prompt here")

Performance

Approximate inference speeds on CPU (AMD Ryzen 5/Intel i5):

Quantization Tokens/sec RAM Usage
Q4_K_M 15-25 ~1.5 GB
Q5_K_M 12-20 ~2 GB
Q8_0 8-15 ~3 GB

Model Task

This model triages rare books into 4 categories:

  • ELIMINATE: Not worth preserving
  • LOW_INTEREST: Low priority
  • PROMISING: Worth investigating
  • HIGH_INTEREST: Top preservation priority

Prompt Format

Triage this book (FAST decision):

Title: [Book Title]
Author: [Author Name]
Publisher: [Publisher]
Year: [Year]
Holdings: [N] libraries
Tier: [1-3]

Quick triage decision (JSON only):

Expected Output

{
  "category": "PROMISING",
  "score": 70,
  "is_thesis": false,
  "is_gov_doc": false,
  "reason": "Older book with limited holdings, potentially rare and unique."
}

Citation

@misc{book-triage-gguf,
  author = {ambrosfitz},
  title = {Llama 3.2 1B Book Triage - GGUF Quantized},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/ambrosfitz/llama-3.2-1b-book-triage-gguf}
}

Original Model

Fine-tuned from: unsloth/Llama-3.2-1B-Instruct

Merged 16-bit version: ambrosfitz/llama-3.2-1b-book-triage

License

Apache 2.0 (same as base Llama 3.2 model)

Quantization Details

  • Tool: llama.cpp
  • Source: 16-bit merged model
  • Formats: Q4_K_M, Q5_K_M, Q8_0, F16
  • Date: 2025-01-13
Downloads last month
55
GGUF
Model size
1B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ambrosfitz/llama-3.2-1b-book-triage-gguf

Quantized
(1)
this model