TinyLlama-1.1B-Chat โ ONNX (FP16)
ONNX export of TinyLlama-1.1B-Chat-v1.0 (1.1B parameters, FP16 weights) with KV cache support for efficient autoregressive generation.
Converted for use with inference4j, an inference-only AI library for Java.
Original Source
- Repository: TinyLlama/TinyLlama-1.1B-Chat-v1.0
- License: Apache 2.0
Usage with inference4j
try (var gen = OnnxTextGenerator.tinyLlama().build()) {
GenerationResult result = gen.generate("What is Java?");
System.out.println(result.text());
}
Model Details
| Property | Value |
|---|---|
| Architecture | LlamaForCausalLM (1.1B parameters, 22 layers, 2048 hidden, 32 heads, 4 KV heads) |
| Task | Text generation (instruction-tuned, Zephyr chat template) |
| Precision | FP16 |
| Context length | 2048 tokens |
| Vocabulary | 32,000 tokens (SentencePiece BPE) |
| Chat template | Zephyr (`< |
| Original framework | PyTorch (transformers) |
| Export method | Hugging Face Optimum (with KV cache, FP16) |
License
This model is licensed under the Apache License 2.0. Original model by TinyLlama.
- Downloads last month
- 13