File size: 7,478 Bytes
a55044d 5e2ff00 a55044d 8eecb5c 3d2398e 12d3fbe 5e2ff00 12d3fbe 8eecb5c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
---
license: gemma
library_name: transformers
pipeline_tag: image-text-to-text
extra_gated_heading: Access Gemma on Hugging Face
extra_gated_prompt: To access Gemma on Hugging Face, you’re required to review and
agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging
Face and click below. Requests are processed immediately.
extra_gated_button_content: Acknowledge license
base_model: google/gemma-3n-E4B
tags:
- automatic-speech-recognition
- automatic-speech-translation
- audio-text-to-text
- video-text-to-text
- mlx
---
# NexaAI/gemma-3n-E4B-it-4bit-MLX
## Quickstart
Run them directly with [nexa-sdk](https://github.com/NexaAI/nexa-sdk) installed
In nexa-sdk CLI:
```bash
NexaAI/gemma-3n-E4B-it-4bit-MLX
```
## Overview
Summary description and brief definition of inputs and outputs.
#### Description
Gemma is a family of lightweight, state-of-the-art open models from Google,
built from the same research and technology used to create the Gemini models.
Gemma 3n models are designed for efficient execution on low-resource devices.
They are capable of multimodal input, handling text, image, video, and audio
input, and generating text outputs, with open weights for pre-trained and
instruction-tuned variants. These models were trained with data in over 140
spoken languages.
Gemma 3n models use selective parameter activation technology to reduce resource
requirements. This technique allows the models to operate at an effective size
of 2B and 4B parameters, which is lower than the total number of parameters they
contain. For more information on Gemma 3n's efficient parameter management
technology, see the
[Gemma 3n](https://ai.google.dev/gemma/docs/gemma-3n#parameters)
page.
#### Inputs and outputs
- **Input:**
- Text string, such as a question, a prompt, or a document to be
summarized
- Images, normalized to 256x256, 512x512, or 768x768 resolution
and encoded to 256 tokens each
- Audio data encoded to 6.25 tokens per second from a single channel
- Total input context of 32K tokens
- **Output:**
- Generated text in response to the input, such as an answer to a
question, analysis of image content, or a summary of a document
- Total output length up to 32K tokens, subtracting the request
input tokens
## Benchmark Results
These models were evaluated at full precision (float32) against a large
collection of different datasets and metrics to cover different aspects of
content generation. Evaluation results marked with **IT** are for
instruction-tuned models. Evaluation results marked with **PT** are for
pre-trained models.
#### Reasoning and factuality
| Benchmark | Metric | n-shot | E2B PT | E4B PT |
| ------------------------------ |----------------|----------|:--------:|:--------:|
| [HellaSwag][hellaswag] | Accuracy | 10-shot | 72.2 | 78.6 |
| [BoolQ][boolq] | Accuracy | 0-shot | 76.4 | 81.6 |
| [PIQA][piqa] | Accuracy | 0-shot | 78.9 | 81.0 |
| [SocialIQA][socialiqa] | Accuracy | 0-shot | 48.8 | 50.0 |
| [TriviaQA][triviaqa] | Accuracy | 5-shot | 60.8 | 70.2 |
| [Natural Questions][naturalq] | Accuracy | 5-shot | 15.5 | 20.9 |
| [ARC-c][arc] | Accuracy | 25-shot | 51.7 | 61.6 |
| [ARC-e][arc] | Accuracy | 0-shot | 75.8 | 81.6 |
| [WinoGrande][winogrande] | Accuracy | 5-shot | 66.8 | 71.7 |
| [BIG-Bench Hard][bbh] | Accuracy | few-shot | 44.3 | 52.9 |
| [DROP][drop] | Token F1 score | 1-shot | 53.9 | 60.8 |
[hellaswag]: https://arxiv.org/abs/1905.07830
[boolq]: https://arxiv.org/abs/1905.10044
[piqa]: https://arxiv.org/abs/1911.11641
[socialiqa]: https://arxiv.org/abs/1904.09728
[triviaqa]: https://arxiv.org/abs/1705.03551
[naturalq]: https://github.com/google-research-datasets/natural-questions
[arc]: https://arxiv.org/abs/1911.01547
[winogrande]: https://arxiv.org/abs/1907.10641
[bbh]: https://paperswithcode.com/dataset/bbh
[drop]: https://arxiv.org/abs/1903.00161
#### Multilingual
| Benchmark | Metric | n-shot | E2B IT | E4B IT |
| ------------------------------------|-------------------------|----------|:--------:|:--------:|
| [MGSM][mgsm] | Accuracy | 0-shot | 53.1 | 60.7 |
| [WMT24++][wmt24pp] (ChrF) | Character-level F-score | 0-shot | 42.7 | 50.1 |
| [Include][include] | Accuracy | 0-shot | 38.6 | 57.2 |
| [MMLU][mmlu] (ProX) | Accuracy | 0-shot | 8.1 | 19.9 |
| [OpenAI MMLU][openai-mmlu] | Accuracy | 0-shot | 22.3 | 35.6 |
| [Global-MMLU][global-mmlu] | Accuracy | 0-shot | 55.1 | 60.3 |
| [ECLeKTic][eclektic] | ECLeKTic score | 0-shot | 2.5 | 1.9 |
[mgsm]: https://arxiv.org/abs/2210.03057
[wmt24pp]: https://arxiv.org/abs/2502.12404v1
[include]:https://arxiv.org/abs/2411.19799
[mmlu]: https://arxiv.org/abs/2009.03300
[openai-mmlu]: https://huggingface.co/datasets/openai/MMMLU
[global-mmlu]: https://huggingface.co/datasets/CohereLabs/Global-MMLU
[eclektic]: https://arxiv.org/abs/2502.21228
#### STEM and code
| Benchmark | Metric | n-shot | E2B IT | E4B IT |
| ------------------------------------|--------------------------|----------|:--------:|:--------:|
| [GPQA][gpqa] Diamond | RelaxedAccuracy/accuracy | 0-shot | 24.8 | 23.7 |
| [LiveCodeBench][lcb] v5 | pass@1 | 0-shot | 18.6 | 25.7 |
| Codegolf v2.2 | pass@1 | 0-shot | 11.0 | 16.8 |
| [AIME 2025][aime-2025] | Accuracy | 0-shot | 6.7 | 11.6 |
[gpqa]: https://arxiv.org/abs/2311.12022
[lcb]: https://arxiv.org/abs/2403.07974
[aime-2025]: https://www.vals.ai/benchmarks/aime-2025-05-09
#### Additional benchmarks
| Benchmark | Metric | n-shot | E2B IT | E4B IT |
| ------------------------------------ |------------|----------|:--------:|:--------:|
| [MMLU][mmlu] | Accuracy | 0-shot | 60.1 | 64.9 |
| [MBPP][mbpp] | pass@1 | 3-shot | 56.6 | 63.6 |
| [HumanEval][humaneval] | pass@1 | 0-shot | 66.5 | 75.0 |
| [LiveCodeBench][lcb] | pass@1 | 0-shot | 13.2 | 13.2 |
| HiddenMath | Accuracy | 0-shot | 27.7 | 37.7 |
| [Global-MMLU-Lite][global-mmlu-lite] | Accuracy | 0-shot | 59.0 | 64.5 |
| [MMLU][mmlu] (Pro) | Accuracy | 0-shot | 40.5 | 50.6 |
[gpqa]: https://arxiv.org/abs/2311.12022
[mbpp]: https://arxiv.org/abs/2108.07732
[humaneval]: https://arxiv.org/abs/2107.03374
[lcb]: https://arxiv.org/abs/2403.07974
[global-mmlu-lite]: https://huggingface.co/datasets/CohereForAI/Global-MMLU-Lite
## Reference
**Original model card**: [google/gemma-3n-E4B-it](https://huggingface.co/google/gemma-3n-E4B-it)
|