Edit Models filters

Apps

Docker Model Runner

Inference Providers

HF Inference API

Misc

compressed-tensors

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Carbon Emissions

Mixture of Experts

Models

2,285

Full-text search

Active filters: compressed-tensors

nm-testing/llama7b-one-shot-2_4-w4a16-packed

Text Generation • 1B • Updated Oct 9, 2024 • 2

nm-testing/tinyllama-one-shot-w4a16-group128-packed

Text Generation • 0.3B • Updated Oct 9, 2024 • 3

nm-testing/llama3-8b-w8_channel-a8_tensor-compressed

Text Generation • 8B • Updated Oct 9, 2024 • 8

nm-testing/llama7b-one-shot-2_4-w4a16-marlin24

Text Generation • 0.9B • Updated Jun 4, 2024 • 4

nm-testing/llama7b-one-shot-2_4-w4a16-group128-packed

Text Generation • 1B • Updated Jun 4, 2024 • 4

nm-testing/llama1.1b_0.5_sparse_bitmask

Text Generation • 0.8B • Updated Oct 9, 2024 • 2

nm-testing/llama7b-one-shot-2_4-w4a16-marlin24-t

Text Generation • 1B • Updated Oct 9, 2024 • 5.95k • 1

nm-testing/tinyllama-one-shot-w8a8-dynamic-channel

Text Generation • 1B • Updated Oct 9, 2024 • 4

nm-testing/llama7b-one-shot-2_4-w4a16-marlin24-t-alt

Text Generation • 0.9B • Updated Oct 9, 2024 • 4

nm-testing/tinyllama-marlin24-w4a16-group128

Text Generation • 0.3B • Updated Oct 9, 2024 • 5

nm-testing/tinyllama-oneshot-w8a8-static-v2

Text Generation • 1B • Updated Oct 9, 2024 • 4

nm-testing/tinyllama-oneshot-w8a8-dynamic-token-v2

Text Generation • 1B • Updated Oct 9, 2024 • 6k

nm-testing/tinyllama-oneshot-w8a8-static-v3

Text Generation • 1B • Updated Jun 17, 2024 • 4

nm-testing/tinyllama-oneshot-w8a8-dynamic-token-v3

Text Generation • 1B • Updated Jun 17, 2024 • 4

nm-testing/tinyllama-oneshot-w4a16-group128-v2

Text Generation • 0.3B • Updated Oct 9, 2024 • 5.86k

nm-testing/tinyllama-oneshot-w4a16-group128-v3

Text Generation • 0.3B • Updated Aug 19, 2024 • 3

nm-testing/tinyllama-oneshot-w4a16-channel-v2

Text Generation • 0.3B • Updated Oct 9, 2024 • 6.15k • 1

nm-testing/tinyllama-oneshot-w4a16-channel-v3

Text Generation • 0.3B • Updated Jun 7, 2024 • 5

nm-testing/tinyllama-oneshot-w8w8-test-static-shape-change

Text Generation • 1B • Updated Oct 9, 2024 • 23.3k

nm-testing/tinyllama-oneshot-w8a8-test-static-shape-change-v3

Text Generation • 1B • Updated Aug 30, 2024 • 4

nm-testing/tinyllama-oneshot-w8a8-channel-dynamic-token-v2

Text Generation • 1B • Updated Oct 9, 2024 • 6.06k

nm-testing/tinyllama-oneshot-w8-channel-a8-tensor

Text Generation • 1B • Updated Oct 9, 2024 • 5.99k

nm-testing/llama-3-instruct-w8a8-dyn-per-token-test

Text Generation • 8B • Updated Oct 9, 2024 • 2

nm-testing/Meta-Llama-3-8B-Instruct-W8A8-Dyn-Per-Token

Text Generation • 8B • Updated Oct 9, 2024 • 4

nm-testing/Meta-Llama-3-8B-Instruct-W8A8-Dyn-Per-Token-2048-Samples

Text Generation • 8B • Updated Oct 9, 2024 • 2

nm-testing/tinyllama-oneshot-w8a16-per-channel

Text Generation • 0.4B • Updated Oct 9, 2024 • 5.85k

nm-testing/Meta-Llama-3-8B-Instruct-W8-Channel-A8-Dynamic-Per-Token-Test

Text Generation • 8B • Updated Oct 9, 2024 • 65

nm-testing/Meta-Llama-3-8B-Instruct-W4-Group128-A16-Test

Text Generation • 2B • Updated Oct 9, 2024 • 4

RedHatAI/Phi-3-mini-128k-instruct-FP8

Text Generation • 4B • Updated Oct 9, 2024 • 24

RedHatAI/Phi-3-medium-128k-instruct-FP8

Text Generation • 14B • Updated Oct 9, 2024 • 2.07k • 5