You need to agree to share your contact information to access this model

The information you provide will be collected, stored, processed and shared in accordance with the NVIDIA Privacy Policy.

NVIDIA Open Model License Agreement

Version Release Date: September 23, 2025
This NVIDIA Open Model License Agreement (the “Agreement”) is a legal agreement between the Legal Entity You represent, or if no entity is identified, You and NVIDIA Corporation and its Affiliates (“NVIDIA”) and governs Your use of the Models that NVIDIA provides to You under this Agreement. NVIDIA and You are each a “party” and collectively the “parties.”
NVIDIA models released under this Agreement are intended to be used permissively and enable the further development of AI technologies. Subject to the terms of this Agreement, NVIDIA confirms that:

  • Models are commercially usable. - You are free to create and distribute Derivative Models. - NVIDIA does not claim ownership to any outputs generated using the Models or Model Derivatives.
    By using, reproducing, modifying, distributing, performing or displaying any portion or element of the Model or Derivative Model, or otherwise accepting the terms of this Agreement, you agree to be bound by this Agreement.

1. Definitions

1.1. Derivative Model means all (a) modifications to the Model, (b) works based on the Model, and (c) any other derivative works of the Model. An output is not a Derivative Model.
1.2. Legal Entity means the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, “control” means (a) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (b) ownership of fifty percent (50%) or more of the outstanding shares, or (c) beneficial ownership of such entity.
1.3. Model means the machine learning model, software, checkpoints, learnt weights, algorithms, parameters, configuration files and documentation shared under this Agreement.
1.4. NVIDIA Cosmos Model means a multimodal Model shared under this Agreement.
1.5. Special-Purpose Model means a Model that is only competent in a narrow set of purpose-specific tasks and should not be used for unintended or general-purpose applications.
1.6. You or Your means an individual or Legal Entity exercising permissions granted by this Agreement.

2. Conditions for Use, License Grant, AI Ethics and IP Ownership

2.1. Conditions for Use - The Model and any Derivative Model are subject to additional terms as described in Section 2 and Section 3 of this Agreement. - If You institute copyright or patent litigation against any entity alleging that the Model or a Derivative Model constitutes infringement, then any licenses granted will terminate as of the date such litigation is filed. - If You bypass or disable any technical limitation, safety guardrail, encryption, DRM, or authentication mechanism contained in the Model without a substantially similar Guardrail, your rights will terminate. - NVIDIA may designate a Model as a Special-Purpose Model. - NVIDIA may update this Agreement to comply with legal and regulatory requirements.

2.2. License Grant NVIDIA grants You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, revocable license to publicly perform, publicly display, reproduce, use, create derivative works of, make, have made, sell, offer for sale, distribute, and import the Model.

2.3. AI Ethics Use of the Models must be consistent with NVIDIA’s Trustworthy AI terms.

2.4. IP Ownership - NVIDIA owns the Model and any Model Derivatives it creates. - You own your Model Derivatives. - NVIDIA claims no ownership rights in outputs. - Except as expressly granted, NVIDIA reserves all rights.

3. Redistribution

You may reproduce and distribute copies of the Model or Derivative Models in any medium, with or without modifications, provided that:

  • 3.1. You must provide recipients with a copy of this Agreement and include this attribution in a “Notice” text file:
    “Licensed by NVIDIA Corporation under the NVIDIA Open Model License”

  • 3.2. If distributing or making available a NVIDIA Cosmos Model, or products/services derived from it, you must include:
    “Built on NVIDIA Cosmos”

  • 3.3. You may add your own copyright statements and license terms for your modifications, provided use still complies with this Agreement.

4. Separate Components The Models may include components licensed under separate legal notices (e.g., Open Source Software Licenses). These terms apply, except where overridden by this Agreement unless required by third-party license terms.

5. Trademarks No permission is granted to use NVIDIA’s trade names, trademarks, or product names, except for reasonable descriptive use.

6. Disclaimer of Warranty The Model is provided “AS IS”, without warranties of any kind, including title, non-infringement, merchantability, or fitness for purpose. You assume risks associated with its use.

7. Limitation of Liability NVIDIA is not liable for damages (direct, indirect, incidental, or consequential) arising from use of the Model, unless required by law.

8. Indemnity You will indemnify and hold NVIDIA harmless against claims from third parties arising from your use or distribution of the Model, derivatives, or outputs.

9. Feedback NVIDIA may use any feedback you provide without restriction or compensation.

10. Governing Law This Agreement is governed by U.S. and Delaware law. Courts in Santa Clara County, California, have exclusive jurisdiction, except for urgent injunctive relief.

11. Trade and Compliance You must comply with all export, import, trade, and sanctions laws, including U.S. Export Administration Regulations and OFAC rules.

Log in or Sign Up to review the conditions and access this model content.

Cosmos-Transfer2.5: A Suite of Diffusion-based World-to-World Models

Cosmos | Code | White Paper | Website

NVIDIA Cosmos™ is a platform of state-of-the-art generative world foundation models, advanced tokenizers, guardrails, and an accelerated data processing and curation pipeline, purpose-built to accelerate the development of physical AI systems, such as autonomous vehicles (AVs) and robots.

Model Overview

Description

Cosmos-Transfer2.5: A family of highly performant pre-trained world foundation models purpose-built for generating physics-aware images, videos and world states aligned with the input control conditions.

Cosmos-Transfer2.5 diffusion models are a collection of diffusion based world foundation models that generate dynamic, high quality images and videos from text, image, or control video inputs. It can serve as the building block for various applications or research that are related to world generation. This model is ready for commercial/non-commercial use.

Model Developer: NVIDIA

Model Versions

The Cosmos-Transfer2.5 diffusion-based model family includes the following models:

  • Cosmos-Transfer2.5-2B
    • Given a text prompt and one or multiple (up to four) control input videos -- Canny edge, blurred RGB, segmentation mask, and depth map -- predict a photorealistic output video by leveraging guidance in the control input videos. Automatic extraction is available for edge and blur controls when only an RGB video is provided.

The model produces 720P video with 16FPS

  • Cosmos-Transfer2.5-2B/ Auto / Multiview
    • Given a text prompt and 7 "world scenario" control input videos (from front center, front left, front right, rear left, rear right, rear tele, front tele cameras on an autonomous vehicle), generate 29 view-consistent frames for each of the 7 cameras at resolution of 1280×720 (text-to-world). The model can additionally be conditioned by 1 or 2 initial latent frames using reference videos of the 7 cameras (image-to-world, video-to-world).

The model has been trained on 720p video at 10FPS.

License

This model is released under the NVIDIA Open Model License. Additional Information: Apache License 2.0.

For a custom license, please contact cosmos-license@nvidia.com.

Under the NVIDIA Open Model License, NVIDIA confirms:

  • Models are commercially usable.
  • You are free to create and distribute Derivative Models.
  • NVIDIA does not claim ownership to any outputs generated using the Models or Derivative Models.

Important Note: If you bypass, disable, reduce the efficacy of, or circumvent any technical limitation, safety guardrail or associated safety guardrail hyperparameter, encryption, security, digital rights management, or authentication mechanism contained in the Model, your rights under NVIDIA Open Model License Agreement will automatically terminate.

Deployment Geography:

Global

Use Case:

Physical AI: encompassing robotics, autonomous vehicles (AV), and more.

Release Date:

Github [10/06/2025] via https://github.com/nvidia-cosmos/cosmos-transfer2.5

Hugging Face [10/06/2025] via https://huggingface.co/collections/nvidia/cosmos-transfer25-6864569b8acaf966a107bfe3

Model Architecture

Cosmos-Transfer2.5-2B is a diffusion transformer model designed for video denoising in the latent space, modulated by multiple control branches.

The diffusion transformer network (“the base model”) is composed of interleaved self-attention, cross-attention and feedforward layers as its building blocks. The cross-attention layers allow the model to condition on input text throughout the denoising process. Before each layer, adaptive layer normalization is applied to embed the time information for denoising. When image or video is provided as input, their latent frames are concatenated with the generated frames along the temporal dimension. Augment noise is added to conditional latent frames to bridge the training and inference gap.

The control branch is formed by replicating a few transformer blocks of the base model. It processes the control input video to extract control signals, which are then injected into the corresponding transformer blocks of the base model, guiding the denoising process with structured control. When multiple control input videos are provided, each is processed by a dedicated control branch to extract modality-specific control signals. These control signals are then combined with spatial-temporal weight maps, and injected into the corresponding transformer blocks in the base model.

This model was developed based on: Cosmos-Predict2.5

Number of model parameters: 2,358,047,744

Input/Output Specifications

  • Input

    • Input Type(s): Text+Video
    • Input Format(s):
      • Text: String
      • Control Input Video: mp4
    • Input Parameters:
      • Text: One-dimensional (1D)
      • Control Input Video: Three-dimensional (3D)
    • Other Properties Related to Input:
      • The input text string should contain fewer than 300 words and should provide descriptive content for world generation, such as a scene description, key objects or characters, background, and any specific actions or motions to be depicted within the 5-second duration.
      • The model supports control input videos of varying lengths, but a length which is multiples of 93 frames (e.g., 93, 186, or 279 frames) performs the best.
      • The model supports four types of control input videos: blurred video, Canny edge video, depth map video, and segmentation mask video. When multiple control inputs are provided, they must be derived from the same source video, representing different modalities of the same content while maintaining identical spatio-temporal dimensions.
      • The control input video should have a spatial resolution of 1280×720 for the 720P model.
  • Output

    • Output Type(s): Video
    • Output Format(s): mp4
    • Output Parameters: Three-dimensional (3D)
    • Other Properties Related to Output: The output video is of the same temporal length and spatial resolution of the control input video. The frame rate of the output video is determined by the model variant (i.e., 16 FPS)

The video content visualizes the input text description as a short animated scene, capturing key elements within the specified time constraints.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration

Runtime Engine(s):

Supported Hardware Microarchitecture Compatibility:

  • NVIDIA Ampere
  • NVIDIA Blackwell
  • NVIDIA Hopper

Note: Only BF16 precision is tested. Other precisions like FP16 or FP32 are not officially supported.

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

Training Dataset:

Data Modality

  • [Image]
  • [Text]
  • [Video]

Data Collection Method by dataset

  • [Automated]

Labeling Method by dataset

  • [Hybrid: Human, Automated]

Testing Dataset:

Data Collection Method by dataset

  • [Automated]

Labeling Method by dataset

  • [Hybrid: Human, Automated]

Evaluation

Please see our technical paper for detailed evaluations of the base model. The control models are built upon the base foundation model.

Data Collection Method:

  • Automated

Labeling Method:

  • Hybrid: Human,Automated

System Requirements and Performance: This model requires 65.4 GB of GPU VRAM.

The following table shows generation times across different NVIDIA GPU hardware for single-GPU inference:

GPU Hardware Cosmos-Transfer2-2B (Segmentation)
NVIDIA B200 285.83 sec
NVIDIA H100 NVL 719.4 sec
NVIDIA H100 PCIe 870.3 sec
NVIDIA H20 2326.6 sec

Operating System(s):

  • Linux (We have not tested on other operating systems.)

Note: Only BF16 precision is tested. Other precisions like FP16 or FP32 are not officially supported.

Usage

Limitations

Despite various improvements in world generation for Physical AI, Cosmos-Transfer2.5 models still face technical and application limitations for world-to-world generation. In particular, they struggle to generate long, high-resolution videos without artifacts. Common issues include temporal inconsistency, camera and object motion instability, and imprecise interactions. The models may inaccurately represent 3D space, 4D space-time, or physical laws in the generated videos, leading to artifacts such as disappearing or morphing objects, unrealistic interactions, and implausible motions. As a result, applying these models for applications that require simulating physical law-grounded environments or complex multi-agent dynamics remains challenging.

Inference:

Acceleration Engine: PyTorch, Transformer Engine

Test Hardware: H100, A100, GB200

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.

Users are responsible for model inputs and outputs. Users are responsible for ensuring safe integration of this model, including implementing guardrails as well as other safety mechanisms, prior to deployment.

For more detailed information on ethical considerations for this model, please see the subcards of Explainability, Bias, Safety & Security, and Privacy below. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

Plus Plus (++) Promise

We value you, the datasets, the diversity they represent, and what we have been entrusted with. This model and its associated data have been:

  • Verified to comply with current applicable disclosure laws, regulations, and industry standards.
  • Verified to comply with applicable privacy labeling requirements.
  • Annotated to describe the collector/source (NVIDIA or a third-party).
  • Characterized for technical limitations.
  • Reviewed to ensure proper disclosure is accessible to, maintained for, and in compliance with NVIDIA data subjects and their requests.
  • Reviewed before release.
  • Tagged for known restrictions and potential safety implications.

Bias

Field Response
Participation considerations from adversely impacted groups protected classes in model design and testing: None
Measures taken to mitigate against unwanted bias: None

Explainability

Field Response
Intended Application & Domain: World Generation
Model Type: Transformer
Intended Users: Physical AI developers
Output: Videos
Describe how the model works: Generates videos based on text and video inputs
Technical Limitations: The model may not follow the video input accurately.
Verified to have met prescribed NVIDIA quality standards: Yes
Performance Metrics: Quantitative and Qualitative Evaluation. We use PAIBench-Transfer, a benchmark dataset containing 600 videos spanning diverse domains such as driving and robotics. The evaluation is structured around two key dimensions: adherence to control inputs (how well the generated video follows the provided conditions) and overall video quality (measuring realism and consistency).
Potential Known Risks: The model's output can generate all forms of videos, including what may be considered toxic, offensive, or indecent.
Licensing: NVIDIA Open Model License. Additional Information: Apache License 2.0.

Privacy

Field Response
Generatable or reverse engineerable personal data? No
Personal data used to create this model? None Known
Was consent obtained for any personal data used? None Known
How often is dataset reviewed? Before Release
Is there provenance for all datasets used in training? Yes
Does data labeling (annotation, metadata) comply with privacy laws? Yes
Is data compliant with data subject requests for data correction or removal, if such a request was made? No, not possible with externally-sourced data.
Applicable Privacy Policy https://www.nvidia.com/en-us/about-nvidia/privacy-policy/

Safety

Field Response
Model Application(s): World Generation
Describe the life critical impact (if present). None Known
Use Case Restrictions: NVIDIA Open Model License. Additional Information: Apache License 2.0.
Model and dataset restrictions: The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. Model checkpoints are made available on Hugging Face, and may become available on cloud providers' model catalog.
Downloads last month
24,414
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including nvidia/Cosmos-Transfer2.5-2B