You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Nemotron-Elastic-12B

Model Developer: NVIDIA

Model Dates:

November 2025

Data Freshness:

September 2024

The pretraining data has a cutoff date of September 2024.

Model Overview

NVIDIA Nemotron-Elastic-12B is a large language model (LLM) developed by NVIDIA for research purposes. This model uses a hybrid architecture consisting primarily of Mamba-2 and MLP layers combined with just four Attention layers, designed to enable elastic inference through nested model extraction.

The model was post-trained from NVIDIA-Nemotron-Nano-12B-v2, incorporating advanced reasoning capabilities and optimized for mathematical and scientific reasoning tasks.

A key innovation of this model is its Elastic Architecture, which enables the extraction of smaller, nested variants (6B and 9B parameters) from the same parameter space without requiring separate training runs.

This model is for research and development only.

The figure below illustrates the overall training and deployment pipelines of Nemotron-Elastic.

This approach provides significant advantages over traditional model compression methods:

Cost Efficiency Benefits

Training Token Savings: Nemotron Elastic achieves a 7.2× reduction in training tokens compared to traditional compression methods. While approaches like Minitron require separate exploratory and knowledge distillation phases for each target size (750B tokens for 6B+9B variants), Nemotron Elastic trains all variants simultaneously in a single run requiring only 110B tokens.

Deployment Memory Efficiency: The nested weight-sharing architecture provides substantial memory advantages. Deploying all three model variants (6B, 9B, and 12B) requires only 24GB memory - equivalent to storing just the 12B model alone. This represents a 42% memory reduction compared to storing separate 9B and 12B checkpoints (42GB), while providing an additional 6B variant at no extra cost.

Configuration	Models	Total Memory (BF16)
Nemotron Elastic	6B + 9B + 12B	24 GB
NanoV2	9B + 12B	42 GB

Scalable Architecture: Unlike traditional compression methods (such as Minitron-SSM, which was used to create the 9B variant from the 12B NanoV2 model) where costs scale linearly with the number of target model sizes, Nemotron Elastic maintains approximately constant training and memory overhead regardless of how many nested variants are extracted. This scalability makes it particularly valuable for edge deployment scenarios that require multiple model sizes to handle varying workloads or user-selected quality-latency tradeoffs.

License/Terms of Use

GOVERNING TERMS: Use of this model is governed by the NVIDIA Internal Scientific Research and Development Model License

Model Architecture

Architecture Type: Mamba2-Transformer Hybrid
Network Architecture: Nemotron-Hybrid
Number of Parameters: 12B Elastic (encapsulates 6B and 9B)

Deployment Geography: Global

Use Case: This model is intended for researchers studying elastic inference, hybrid architectures, mathematical reasoning, and AI systems that require flexible computational resource allocation.

Release Date:

Huggingface: 11/19/2025 via https://huggingface.co/nvidia/Nemotron-Elastic-12B

Elastic Model Variants

The Nemotron-Elastic-12B model supports extraction of smaller nested variants:

Nemotron-Elastic-6B: 6B parameter variant extracted from the 12B model
Nemotron-Elastic-9B: 9B parameter variant extracted from the 12B model

These variants are extracted using the provided slicing script at slice_nemotron_elastic.py.

Input

Input Type(s): Text
Input Format(s): String
Input Parameters: One-Dimensional (1D): Sequences
Other Properties Related to Input: Context length up to 128K. Supported languages include English and multilingual capabilities.

Output

Output Type(s): Text
Output Format: String
Output Parameters: One-Dimensional (1D): Sequences up to 128K

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration

Runtime Engine(s): HF
Supported Hardware Microarchitecture Compatibility: NVIDIA H100-80GB
Operating System(s): Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

Model Version

v1.0

Prompt Format

We follow the jinja chat template provided below. This template conditionally adds <think>\n to the start of the Assistant response if /think is found in either the system prompt or any user message. If no reasoning signal is added, the model defaults to reasoning "on" mode. The chat template adds <think></think> to the start of the Assistant response if /no_think is found in the system prompt. Thus enforcing reasoning on/off behavior.

{%- set ns = namespace(enable_thinking = true) %}
{%- for message in messages -%}
    {%- set content = message['content'] -%}
    {%- if message['role'] == 'user' or message['role'] == 'system' -%}
        {%- if '/think' in content -%}
            {%- set ns.enable_thinking = true -%}
        {%- elif '/no_think' in content -%}
            {%- set ns.enable_thinking = false -%}
        {%- endif -%}
    {%- endif -%}
{%- endfor -%}
{%- if messages[0]['role'] != 'system' -%}
    {%- set ns.non_tool_system_content = '' -%}
    {{- '<SPECIAL_10>System\n' -}}
{%- else -%}
    {%- set ns.non_tool_system_content = messages[0]['content']
        .replace('/think', '')
        .replace('/no_think', '')
        .strip()
    -%}
    {{- '<SPECIAL_10>System\n' + ns.non_tool_system_content }}
{%- endif -%}
{%- if tools -%}
    {%- if ns.non_tool_system_content is defined and ns.non_tool_system_content != '' -%}
        {{- '\n\n' -}}
    {%- endif -%}
    {{- 'You can use the following tools to assist the user if required:' -}}
    {{- '\n<AVAILABLE_TOOLS>[' -}}
    {%- for tool in tools -%}
        {{- (tool.function if tool.function is defined else tool) | tojson -}}
        {{- ', ' if not loop.last else '' -}}
    {%- endfor -%}
    {{- ']</AVAILABLE_TOOLS>\n\n' -}}
    {{- 'If you decide to call any tool(s), use the following format:\n' -}}
    {{- '<TOOLCALL>[{{"name": "tool_name1", "arguments": "tool_args1"}}, ' -}}
    {{- '{{"name": "tool_name2", "arguments": "tool_args2"}}]</TOOLCALL>\n\n' -}}
    {{- 'The user will execute tool-calls and return responses from tool(s) in this format:\n' -}}
    {{- '<TOOL_RESPONSE>[{{"tool_response1"}}, {{"tool_response2"}}]</TOOL_RESPONSE>\n\n' -}}
    {{- 'Based on the tool responses, you can call additional tools if needed, correct tool calls if any errors are found, or just respond to the user.' -}}
{%- endif -%}
{{- '\n' -}}
{%- set messages = messages[1:] if messages[0]['role'] == 'system' else messages -%}
{%- if messages[-1]['role'] == 'assistant' -%}
    {%- set ns.last_turn_assistant_content = messages[-1]['content'].strip() -%}
    {%- set messages = messages[:-1] -%}
{%- endif -%}
{%- for message in messages -%}
    {%- set content = message['content'] -%}
    {%- if message['role'] == 'user' -%}
        {{- '<SPECIAL_11>User\n' + content.replace('/think', '').replace('/no_think', '').strip() + '\n' }}
    {%- elif message['role'] == 'tool' -%}
        {%- if loop.first or (messages[loop.index0 - 1].role != 'tool') -%}
            {{- '<SPECIAL_11>User\n' + '<TOOL_RESPONSE>[' }}
        {%- endif -%}
        {{- message['content'] -}}
        {{- ', ' if not loop.last and (messages[loop.index0 + 1].role == 'tool') else '' -}}
        {%- if loop.last or (messages[loop.index0 + 1].role != 'tool') -%}
            {{- ']</TOOL_RESPONSE>\n' -}}
        {%- endif -%}
    {%- elif message['role'] == 'assistant' -%}
        {%- if '</think>' in content -%}
            {%- set content = content.split('</think>')[1].strip() %}
        {%- endif -%}
        {{- '<SPECIAL_11>Assistant\n' + content.strip() }}
        {%- if message.tool_calls -%}
            {%- if content.strip() != '' -%}
                {{- '\n\n' -}}
            {%- endif -%}
            {{- '<TOOLCALL>[' -}}
            {%- for call in message.tool_calls -%}
                {%- set fn = call.function if call.function is defined else call -%}
                {{- '{"name": "' + fn.name + '", "arguments": ' -}}
                {%- if fn.arguments is string -%}
                    {{- fn.arguments -}}
                {%- else -%}
                    {{- fn.arguments | tojson -}}
                {%- endif -%}
                {{- '}' + (', ' if not loop.last else '') -}}
            {%- endfor -%}
            {{- ']</TOOLCALL>' -}}
        {%- endif -%}
        {{- '\n<SPECIAL_12>\n' -}}
    {%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
    {{- '<SPECIAL_11>Assistant\n' -}}
    {%- if ns.enable_thinking is defined and ns.enable_thinking is false -%}
        {{- '<think></think>' -}}
    {%- else -%}
        {{- '<think>\n' -}}
    {%- endif -%}
    {%- if ns.last_turn_assistant_content is defined and ns.last_turn_assistant_content != '' -%}
        {{- ns.last_turn_assistant_content -}}
    {%- endif -%}
{%- else -%}
    {%- if ns.last_turn_assistant_content is defined and ns.last_turn_assistant_content != '' -%}
        {{- '<SPECIAL_11>Assistant\n' -}}
        {%- if ns.enable_thinking is defined and ns.enable_thinking is false -%}
            {{- '<think></think>' -}}
        {%- else -%}
            {{- '<think>\n' -}}
        {%- endif -%}
        {{- ns.last_turn_assistant_content -}}
        {%- if continue_final_message is defined -%}
            {%- if continue_final_message is false -%}
                {{- '\n<SPECIAL_12>\n' -}}
            {%- endif -%}
        {%- else -%}
            {{- '\n<SPECIAL_12>\n' -}}
        {%- endif -%}
    {%- endif -%}
{%- endif -%}

Training, Testing, and Evaluation Datasets

Training datasets

Data Modality: Text
Text Training Data Size: More than 10 Trillion Tokens
Train/Test/Valid Split: We used 100% of the corpus for pre-training and relied on external benchmarks for testing.
Data Collection Method by dataset: Hybrid: Automated, Human, Synthetic
Labeling Method by dataset: Hybrid: Automated, Human, Synthetic

Properties: The post-training corpus for NVIDIA-Nemotron-Nano-12B-v2 consists of English and multilingual text (German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English). Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including code, legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracies. For several of the domains listed above we used synthetic data, specifically reasoning traces, from DeepSeek R1/R1-0528, Qwen3-235B-A22B, Nemotron 4 340B, Qwen2.5-32B-Instruct-AWQ, Qwen2.5-14B-Instruct, Qwen 2.5 72B.

The pre-training corpus for NVIDIA-Nemotron-Nano-12B-v2 consists of high-quality curated and synthetically-generated data. It is trained in the English language, as well as 15 multilingual languages and 43 programming languages. Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracy. The model was pre-trained for approximately twenty trillion tokens.

Alongside the model, we release our final pretraining data, as outlined in this section. For ease of analysis, there is a sample set that is ungated. For all remaining code, math and multilingual data, gating and approval is required, and the dataset is permissively licensed for model training purposes.

More details on the datasets and synthetic data generation methods can be found in the technical report NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model .

Public Datasets

Dataset	Collection Period
Problems in Elementary Mathematics for Home Study	4/23/2025
GSM8K	4/23/2025
PRM800K	4/23/2025
CC-NEWS	4/23/2025
Common Crawl	4/23/2025
Wikimedia	4/23/2025
Bespoke-Stratos-17k	4/23/2025
tigerbot-kaggle-leetcodesolutions-en-2k	4/23/2025
glaive-function-calling-v2	4/23/2025
APIGen Function-Calling	4/23/2025
LMSYS-Chat-1M	4/23/2025
Open Textbook Library - CC BY-SA & GNU subset and OpenStax - CC BY-SA subset	4/23/2025
Advanced Reasoning Benchmark, tigerbot-kaggle-leetcodesolutions-en-2k, PRM800K, and SciBench	4/23/2025
FineWeb-2	4/23/2025
Court Listener	Legacy Download
peS2o	Legacy Download
OpenWebMath	Legacy Download
BioRxiv	Legacy Download
PMC Open Access Subset	Legacy Download
OpenWebText2	Legacy Download
Stack Exchange Data Dump	Legacy Download
PubMed Abstracts	Legacy Download
NIH ExPorter	Legacy Download
arXiv	Legacy Download
BigScience Workshop Datasets	Legacy Download
Reddit Dataset	Legacy Download
SEC's Electronic Data Gathering, Analysis, and Retrieval (EDGAR)	Legacy Download
Public Software Heritage S3	Legacy Download
The Stack	Legacy Download
mC4	Legacy Download
Advanced Mathematical Problem Solving	Legacy Download
MathPile	Legacy Download
NuminaMath CoT	Legacy Download
PMC Article	Legacy Download
FLAN	Legacy Download
Advanced Reasoning Benchmark	Legacy Download
SciBench	Legacy Download
WikiTableQuestions	Legacy Download
FinQA	Legacy Download
Riddles	Legacy Download
Problems in Elementary Mathematics for Home Study	Legacy Download
MedMCQA	Legacy Download
Cosmos QA	Legacy Download
MCTest	Legacy Download
AI2's Reasoning Challenge	Legacy Download
OpenBookQA	Legacy Download
MMLU Auxiliary Train	Legacy Download
social-chemestry-101	Legacy Download
Moral Stories	Legacy Download
The Common Pile v0.1	Legacy Download
FineMath	Legacy Download
MegaMath	Legacy Download
FastChat	6/30/2025

Private Non-publicly Accessible Datasets of Third Parties

Dataset
Global Regulation
Workbench

Online Dataset Sources

The English Common Crawl data was downloaded from the Common Crawl Foundation (see their FAQ for details on their crawling) and includes the snapshots CC-MAIN-2013-20 through CC-MAIN-2025-13. The data was subsequently deduplicated and filtered in various ways described in the Nemotron-CC paper.

Additionally, we extracted data for fifteen languages from the following three Common Crawl snapshots: CC-MAIN-2024-51, CC-MAIN-2025-08, CC-MAIN-2025-18. The fifteen languages included were Arabic, Chinese, Danish, Dutch, French, German, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Swedish, and Thai. As we did not have reliable multilingual model-based quality classifiers available, we applied just heuristic filtering instead—similar to what we did for lower quality English data in the Nemotron-CC pipeline, but selectively removing some filters for some languages that did not work well. Deduplication was done in the same way as for Nemotron-CC.

The GitHub Crawl was collected using the GitHub REST API and the Amazon S3 API. Each crawl was operated in accordance with the rate limits set by its respective source, either GitHub or S3. We collect raw source code and subsequently remove any having a license which does not exist in our permissive-license set (for additional details, refer to the technical report).

Dataset	Modality	Dataset Size (Tokens)	Collection Period
English Common Crawl	Text	3.360T	4/8/2025
Multilingual Common Crawl	Text	812.7B	5/1/2025
GitHub Crawl	Text	747.4B	4/29/2025

NVIDIA-Sourced Synthetic Datasets

Dataset	Modality	Dataset Size (Tokens)	Seed Dataset	Model(s) used for generation
Synthetic Art of Problem Solving from DeepSeek-R1	Text	25.5B	Art of Problem Solving; American Mathematics Competitions 8; American Mathematics Competitions 10;	DeepSeek-R1
Synthetic Moral Stories and Social Chemistry from Mixtral-8x22B-v0.1	Text	327M	social-chemestry-101; Moral Stories	Mixtral-8x22B-v0.1
Synthetic Social Sciences seeded with OpenStax from DeepSeek-V3, Mixtral-8x22B-v0.1, and Qwen2.5-72B	Text	83.6M	OpenStax - CC BY-SA subset	DeepSeek-V3; Mixtral-8x22B-v0.1; Qwen2.5-72B
Synthetic Health Sciences seeded with OpenStax from DeepSeek-V3, Mixtral-8x22B-v0.1, and Qwen2.5-72B	Text	9.7M	OpenStax - CC BY-SA subset	DeepSeek-V3; Mixtral-8x22B-v0.1; Qwen2.5-72B
Synthetic STEM seeded with OpenStax, Open Textbook Library, and GSM8K from DeepSeek-R1, DeepSeek-V3, DeepSeek-V3-0324, and Qwen2.5-72B	Text	175M	OpenStax - CC BY-SA subset; GSM8K; Open Textbook Library - CC BY-SA & GNU subset	DeepSeek-R1, DeepSeek-V3; DeepSeek-V3-0324; Qwen2.5-72B
Nemotron-PrismMath	Text	4.6B	Big-Math-RL-Verified; OpenR1-Math-220k	Qwen2.5-0.5B-instruct, Qwen2.5-72B-Instruct; DeepSeek-R1-Distill-Qwen-32B
Synthetic Question Answering Data from Papers and Permissible Books from Qwen2.5-72B-Instruct	Text	350M	arXiv; National Institutes of Health ExPorter; BioRxiv; PMC Article; USPTO Backgrounds; peS2o; Global Regulation; CORE; PG-19; DOAB CC BY & CC BY-SA subset; NDLTD	Qwen2.5-72B-Instruct
Synthetic FineMath-4+ Reprocessed from DeepSeek-V3	Text	9.2B	Common Crawl	DeepSeek-V3
Synthetic FineMath-3+ Reprocessed from phi-4	Text	27.6B	Common Crawl	phi-4
Synthetic Union-3+ Reprocessed from phi-4	Text	93.1B	Common Crawl	phi-4
Refreshed Nemotron-MIND from phi-4	Text	73B	Common Crawl	phi-4
Synthetic Union-4+ Reprocessed from phi-4	Text	14.12B	Common Crawl	phi-4
Synthetic Union-3+ minus 4+ Reprocessed from phi-4	Text	78.95B	Common Crawl	phi-4
Synthetic Union-3 Refreshed from phi-4	Text	80.94B	Common Crawl	phi-4
Synthetic Union-4+ Refreshed from phi-4	Text	52.32B	Common Crawl	phi-4
Synthetic AGIEval seeded with AQUA-RAT, LogiQA, and AR-LSAT from DeepSeek-V3 and DeepSeek-V3-0324	Text	4.0B	AQUA-RAT; LogiQA; AR-LSAT	DeepSeek-V3; DeepSeek-V3-0324
Synthetic AGIEval seeded with AQUA-RAT, LogiQA, and AR-LSAT from Qwen3-30B-A3B	Text	4.2B	AQUA-RAT; LogiQA; AR-LSAT	Qwen3-30B-A3B
Synthetic Art of Problem Solving from Qwen2.5-32B-Instruct, Qwen2.5-Math-72B, Qwen2.5-Math-7B, and Qwen2.5-72B-Instruct	Text	83.1B	Art of Problem Solving; American Mathematics Competitions 8; American Mathematics Competitions 10; GSM8K; PRM800K	Qwen2.5-32B-Instruct; Qwen2.5-Math-72B; Qwen2.5-Math-7B; Qwen2.5-72B-Instruct
Synthetic MMLU Auxiliary Train from DeepSeek-R1	Text	0.5B	MMLU Auxiliary Train	DeepSeek-R1
Synthetic Long Context Continued Post-Training Data from Papers and Permissible Books from Qwen2.5-72B-Instruct	Text	5.4B	arXiv; National Institutes of Health ExPorter; BioRxiv; PMC Article; USPTO Backgrounds; peS2o; Global Regulation; CORE; PG-19; DOAB CC BY & CC BY-SA subset; NDLTD	Qwen2.5-72B-Instruct
Synthetic Common Crawl from Qwen3-30B-A3B and Mistral-Nemo-12B-Instruct	Text	1.949T	Common Crawl	Qwen3-30B-A3B; Mistral-NeMo-12B-Instruct
Synthetic Multilingual Data from Common Crawl from Qwen3-30B-A3B	Text	997.3B	Common Crawl	Qwen3-30B-A3B
Synthetic Multilingual Data from Wikimedia from Qwen3-30B-A3B	Text	55.1B	Wikimedia	Qwen3-30B-A3B
Synthetic OpenMathReasoning from DeepSeek-R1-0528	Text	1.5M	OpenMathReasoning	DeepSeek-R1-0528
Synthetic OpenCodeReasoning from DeepSeek-R1-0528	Text	1.1M	OpenCodeReasoning	DeepSeek-R1-0528
Synthetic Science Data from DeepSeek-R1-0528	Text	1.5M	-	DeepSeek-R1-0528
Synthetic Humanity's Last Exam from DeepSeek-R1-0528	Text	460K	Humanity's Last Exam	DeepSeek-R1-0528
Synthetic ToolBench from Qwen3-235B-A22B	Text	400K	ToolBench	Qwen3-235B-A22B
Synthetic Nemotron Content Safety Dataset V2, eval-safety, Gretel Synthetic Safety Alignment, and RedTeam_2K from DeepSeek-R1-0528	Text	52K	Nemotron Content Safety Dataset V2; eval-safety; Gretel Synthetic Safety Alignment; RedTeam_2K	DeepSeek-R1-0528
Synthetic HelpSteer from Qwen3-235B-A22B	Text	120K	HelpSteer3; HelpSteer2	Qwen3-235B-A22B
Synthetic Alignment data from Mixtral-8x22B-Instruct-v0.1, Mixtral-8x7B-Instruct-v0.1, and Nemotron-4 Family	Text	400K	HelpSteer2; C4; LMSYS-Chat-1M; ShareGPT52K; tigerbot-kaggle-leetcodesolutions-en-2k; GSM8K; PRM800K; lm_identity (NVIDIA internal); FinQA; WikiTableQuestions; Riddles; ChatQA nvolve-multiturn (NVIDIA internal); glaive-function-calling-v2; SciBench; OpenBookQA; Advanced Reasoning Benchmark; Public Software Heritage S3; Khan Academy Math Keywords	Nemotron-4-15B-Base (NVIDIA internal); Nemotron-4-15B-Instruct (NVIDIA internal); Nemotron-4-340B-Base; Nemotron-4-340B-Instruct; Nemotron-4-340B-Reward; Mixtral-8x7B-Instruct-v0.1; Mixtral-8x22B-Instruct-v0.1
Synthetic LMSYS-Chat-1M from Qwen3-235B-A22B	Text	1M	LMSYS-Chat-1M	Qwen3-235B-A22B
Synthetic Multilingual Reasoning data from DeepSeek-R1-0528, Qwen2.5-32B-Instruct-AWQ, and Qwen2.5-14B-Instruct	Text	25M	OpenMathReasoning; OpenCodeReasoning	DeepSeek-R1-0528; Qwen2.5-32B-Instruct-AWQ (translation); Qwen2.5-14B-Instruct (translation);
Synthetic Multilingual Reasoning data from Qwen3-235B-A22B and Gemma 3 Post-Trained models	Text	5M	WildChat	Qwen3-235B-A22B; Gemma 3 PT 12B; Gemma 3 PT 27B

Evaluation Dataset:

Data Collection Method by dataset: Hybrid: Human, Synthetic
Labeling Method by dataset: Hybrid: Automated, Human, Synthetic

Benchmark Results

Reasoning Evaluations (Reasoning ON)

The following table shows performance across key reasoning and mathematical benchmarks. All Nemotron-Elastic variants and Nanov2 baselines represent checkpoints at the end of 49k context distillation run (prior to RL and checkpoint merging). Other models represents the final, public version.

The accuracy shown is the average across all benchmarks: MATH-500, AIME-2024, AIME-2025, GPQA, LiveCodeBench v5, and MMLU-Pro.

Benchmark Descriptions:

MATH-500: A subset of 500 questions from the MATH benchmark testing mathematical problem-solving capabilities.
AIME-2024/2025: American Invitational Mathematics Examination problems testing advanced mathematical reasoning.
GPQA: Graduate-level Google-Proof Q&A dataset testing scientific reasoning.
LiveCodeBench v5: Real-world coding problems testing programming and algorithmic thinking.
MMLU-Pro: Enhanced version of MMLU testing knowledge across multiple domains.

Elastic Model Extraction

The model supports extraction of nested variants using the provided slicing script:

python slice_nemotron_elastic.py \
    --model_path <path to 12b model> \
    --slice_size 6b \
    --save_path ./nemotron-elastic-6b

python slice_nemotron_elastic.py \
    --model_path <path to 12b model> \
    --slice_size 9b \
    --save_path ./nemotron-elastic-9b

The slicing process preserves the hybrid architecture while reducing model size through structured pruning of embedding dimensions and MLP layers.

Potential Known Risks for Usage

The model was trained on data that contains toxic language, unsafe content, and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. Code produced by the model may not always model real-world contexts and should be checked. The model demonstrates weakness to alignment-breaking attacks. Users are advised to deploy language model guardrails alongside this model to prevent potentially harmful outputs.

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

For more detailed information on ethical considerations for this model, please see the Responsible Use Guide available at http://nvidia.com/nemotron-responsible-use.

Please report security vulnerabilities or NVIDIA AI Concerns here.

Research Applications

This model is particularly suitable for research in:

Elastic Inference: Studying adaptive model sizing based on computational constraints
Hybrid Architectures: Exploring the combination of Mamba-2 and Transformer layers
Model Compression: Understanding structured pruning and nested model extraction
Resource-Adaptive AI: Developing systems that can scale computational requirements dynamically

Citation

@misc{nemotron-elastic-12b-2025,
  title={Nemotron Elastic},
  author={NVIDIA},
  year={2025},
  note={Research release for studying elastic inference and hybrid architectures}
}

Example Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the tokenizer and model (full 12B version)
tokenizer = AutoTokenizer.from_pretrained("nvidia/Nemotron-Elastic-12B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("nvidia/Nemotron-Elastic-12B", torch_dtype=torch.bfloat16, trust_remote_code=True).cuda()

# Use the prompt template
messages = [
    {"role": "system", "content": "You are a helpful mathematical reasoning assistant"},
    {"role": "user", "content": "Solve the following equation: 2x + 5 = 15"},
]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)

outputs = model.generate(tokenized_chat, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

Note: This example uses the full 12B model directly. Alternatively, you can extract smaller variants (6B or 9B) using the slicing script mentioned above if you need reduced computational requirements for your specific deployment scenario.

Downloads last month: 33

Safetensors

Model size

12B params

Tensor type

BF16

Model tree for nvidia/Nemotron-Elastic-12B

Base model

nvidia/NVIDIA-Nemotron-Nano-12B-v2-Base

Finetuned

nvidia/NVIDIA-Nemotron-Nano-12B-v2

Finetuned

(10)

this model