Prompt-Guard-86M — ONNX (INT8)
Built with Llama.
This repo provides a quantized ONNX Runtime export of meta-llama/Prompt-Guard-86M.
What’s inside
- model.onnx: INT8 quantized graph (ONNX Runtime dynamic quantization)
- tokenizer.json/- tokenizer_config.json
- config.json
- LICENSE(Llama 3.1 Community License) and- NOTICE
How it was made
- Export: 🤗 Optimum ONNX exporter
- Quantization: ONNX Runtime (dynamic, per-channel where supported)
- Command: optimum-cli export onnx ...thenonnxruntime.quantization ...
- Environment: onnxruntime==, optimum== (See Optimum/ONNX docs for details.)
Usage (Python)
import onnxruntime as ort
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("<you>/Prompt-Guard-86M-onnx-int8")
session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])
text = "your input"
enc = tok(text, return_tensors="np", padding=True, truncation=True)
outputs = session.run(None, {k: v for k, v in enc.items()})
- Downloads last month
- 244
Model tree for Derbdale/Llama-Prompt-Guard-86M-ONNX
Base model
meta-llama/Prompt-Guard-86M