jacobitterman commited on
Commit
dad38ea
·
verified ·
1 Parent(s): 37f393c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -28
README.md CHANGED
@@ -13,7 +13,7 @@ library_name: diffusers
13
  # LTX-Video Model Card
14
  This model card focuses on the model associated with the LTX-Video model, codebase available [here](https://github.com/Lightricks/LTX-Video).
15
 
16
- LTX-Video is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 24 FPS videos at a 768x512 resolution faster than they can be watched. Trained on a large-scale dataset of diverse videos, the model generates high-resolution videos with realistic and varied content.
17
  We provide a model for both text-to-video as well as image+text-to-video usecases
18
 
19
  <img src="./media/trailer.gif" alt="trailer" width="512">
@@ -26,6 +26,15 @@ We provide a model for both text-to-video as well as image+text-to-video usecase
26
  | ![example9](./media/ltx-video_example_00009.gif)<br><details style="max-width: 300px; margin: auto;"><summary>A man with graying hair, a beard, and a gray shirt...</summary>A man with graying hair, a beard, and a gray shirt looks down and to his right, then turns his head to the left. The camera angle is a close-up, focused on the man's face. The lighting is dim, with a greenish tint. The scene appears to be real-life footage. Step</details> | ![example10](./media/ltx-video_example_00010.gif)<br><details style="max-width: 300px; margin: auto;"><summary>A clear, turquoise river flows through a rocky canyon...</summary>A clear, turquoise river flows through a rocky canyon, cascading over a small waterfall and forming a pool of water at the bottom.The river is the main focus of the scene, with its clear water reflecting the surrounding trees and rocks. The canyon walls are steep and rocky, with some vegetation growing on them. The trees are mostly pine trees, with their green needles contrasting with the brown and gray rocks. The overall tone of the scene is one of peace and tranquility.</details> | ![example11](./media/ltx-video_example_00011.gif)<br><details style="max-width: 300px; margin: auto;"><summary>A man in a suit enters a room and speaks to two women...</summary>A man in a suit enters a room and speaks to two women sitting on a couch. The man, wearing a dark suit with a gold tie, enters the room from the left and walks towards the center of the frame. He has short gray hair, light skin, and a serious expression. He places his right hand on the back of a chair as he approaches the couch. Two women are seated on a light-colored couch in the background. The woman on the left wears a light blue sweater and has short blonde hair. The woman on the right wears a white sweater and has short blonde hair. The camera remains stationary, focusing on the man as he enters the room. The room is brightly lit, with warm tones reflecting off the walls and furniture. The scene appears to be from a film or television show.</details> | ![example12](./media/ltx-video_example_00012.gif)<br><details style="max-width: 300px; margin: auto;"><summary>The waves crash against the jagged rocks of the shoreline...</summary>The waves crash against the jagged rocks of the shoreline, sending spray high into the air.The rocks are a dark gray color, with sharp edges and deep crevices. The water is a clear blue-green, with white foam where the waves break against the rocks. The sky is a light gray, with a few white clouds dotting the horizon.</details> |
27
  | ![example13](./media/ltx-video_example_00013.gif)<br><details style="max-width: 300px; margin: auto;"><summary>The camera pans across a cityscape of tall buildings...</summary>The camera pans across a cityscape of tall buildings with a circular building in the center. The camera moves from left to right, showing the tops of the buildings and the circular building in the center. The buildings are various shades of gray and white, and the circular building has a green roof. The camera angle is high, looking down at the city. The lighting is bright, with the sun shining from the upper left, casting shadows from the buildings. The scene is computer-generated imagery.</details> | ![example14](./media/ltx-video_example_00014.gif)<br><details style="max-width: 300px; margin: auto;"><summary>A man walks towards a window, looks out, and then turns around...</summary>A man walks towards a window, looks out, and then turns around. He has short, dark hair, dark skin, and is wearing a brown coat over a red and gray scarf. He walks from left to right towards a window, his gaze fixed on something outside. The camera follows him from behind at a medium distance. The room is brightly lit, with white walls and a large window covered by a white curtain. As he approaches the window, he turns his head slightly to the left, then back to the right. He then turns his entire body to the right, facing the window. The camera remains stationary as he stands in front of the window. The scene is captured in real-life footage.</details> | ![example15](./media/ltx-video_example_00015.gif)<br><details style="max-width: 300px; margin: auto;"><summary>Two police officers in dark blue uniforms and matching hats...</summary>Two police officers in dark blue uniforms and matching hats enter a dimly lit room through a doorway on the left side of the frame. The first officer, with short brown hair and a mustache, steps inside first, followed by his partner, who has a shaved head and a goatee. Both officers have serious expressions and maintain a steady pace as they move deeper into the room. The camera remains stationary, capturing them from a slightly low angle as they enter. The room has exposed brick walls and a corrugated metal ceiling, with a barred window visible in the background. The lighting is low-key, casting shadows on the officers' faces and emphasizing the grim atmosphere. The scene appears to be from a film or television show.</details> | ![example16](./media/ltx-video_example_00016.gif)<br><details style="max-width: 300px; margin: auto;"><summary>A woman with short brown hair, wearing a maroon sleeveless top...</summary>A woman with short brown hair, wearing a maroon sleeveless top and a silver necklace, walks through a room while talking, then a woman with pink hair and a white shirt appears in the doorway and yells. The first woman walks from left to right, her expression serious; she has light skin and her eyebrows are slightly furrowed. The second woman stands in the doorway, her mouth open in a yell; she has light skin and her eyes are wide. The room is dimly lit, with a bookshelf visible in the background. The camera follows the first woman as she walks, then cuts to a close-up of the second woman's face. The scene is captured in real-life footage.</details> |
28
 
 
 
 
 
 
 
 
 
 
29
 
30
  ## Model Details
31
  - **Developed by:** Lightricks
@@ -37,9 +46,15 @@ We provide a model for both text-to-video as well as image+text-to-video usecase
37
 
38
  ### Direct use
39
  You can use the model for purposes under the license:
40
- - version 0.9: [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.safetensors)
41
- - version 0.9.1 [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.1.safetensors)
42
- - version 0.9.5 [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.5.safetensors)
 
 
 
 
 
 
43
 
44
 
45
  ### General tips:
@@ -48,10 +63,11 @@ You can use the model for purposes under the license:
48
  * Prompts should be in English. The more elaborate the better. Good prompt looks like `The turquoise waves crash against the dark, jagged rocks of the shore, sending white foam spraying into the air. The scene is dominated by the stark contrast between the bright blue water and the dark, almost black rocks. The water is a clear, turquoise color, and the waves are capped with white foam. The rocks are dark and jagged, and they are covered in patches of green moss. The shore is lined with lush green vegetation, including trees and bushes. In the background, there are rolling hills covered in dense forest. The sky is cloudy, and the light is dim.`
49
 
50
  ### Online demo
51
- The model is accessible right away via following links:
52
- - [HF Playground](https://huggingface.co/spaces/Lightricks/LTX-Video-Playground)
53
  - [Fal.ai text-to-video](https://fal.ai/models/fal-ai/ltx-video)
54
  - [Fal.ai image-to-video](https://fal.ai/models/fal-ai/ltx-video/image-to-video)
 
55
 
56
  ### ComfyUI
57
  To use our model with ComfyUI, please follow the instructions at a dedicated [ComfyUI repo](https://github.com/Lightricks/ComfyUI-LTXVideo/).
@@ -72,15 +88,6 @@ source env/bin/activate
72
  python -m pip install -e .\[inference-script\]
73
  ```
74
 
75
- Then, download the model from [Hugging Face](https://huggingface.co/Lightricks/LTX-Video-0.9.1)
76
-
77
- ```python
78
- from huggingface_hub import snapshot_download
79
-
80
- model_path = 'PATH' # The local directory to save downloaded checkpoint
81
- snapshot_download("Lightricks/LTX-Video-0.9.1", local_dir=model_path, local_dir_use_symlinks=False, repo_type='model')
82
- ```
83
-
84
  #### Inference
85
 
86
  To use our model, please follow the inference code in [inference.py](https://github.com/Lightricks/LTX-Video/blob/main/inference.py):
@@ -88,13 +95,13 @@ To use our model, please follow the inference code in [inference.py](https://git
88
  ##### For text-to-video generation:
89
 
90
  ```bash
91
- python inference.py --ckpt_dir 'PATH' --prompt "PROMPT" --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED
92
  ```
93
 
94
  ##### For image-to-video generation:
95
 
96
  ```bash
97
- python inference.py --ckpt_dir 'PATH' --prompt "PROMPT" --input_image_path IMAGE_PATH --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED
98
  ```
99
 
100
  ### Diffusers 🧨
@@ -109,14 +116,12 @@ pip install -U git+https://github.com/huggingface/diffusers
109
 
110
  Now, you can run the examples below:
111
 
112
- Text-to-Video:
113
-
114
- ```python
115
  import torch
116
  from diffusers import LTXPipeline
117
  from diffusers.utils import export_to_video
118
 
119
- pipe = LTXPipeline.from_pretrained("Lightricks/LTX-Video-0.9.1", torch_dtype=torch.bfloat16)
120
  pipe.to("cuda")
121
 
122
  prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
@@ -129,20 +134,18 @@ video = pipe(
129
  height=480,
130
  num_frames=161,
131
  num_inference_steps=50,
132
- decode_timestep=0.03,
133
- decode_noise_scale=0.025,
134
  ).frames[0]
135
  export_to_video(video, "output.mp4", fps=24)
136
  ```
137
 
138
- Image-to-Video:
139
 
140
- ```python
141
  import torch
142
  from diffusers import LTXImageToVideoPipeline
143
  from diffusers.utils import export_to_video, load_image
144
 
145
- pipe = LTXImageToVideoPipeline.from_pretrained("Lightricks/LTX-Video-0.9.1", torch_dtype=torch.bfloat16)
146
  pipe.to("cuda")
147
 
148
  image = load_image(
@@ -159,8 +162,6 @@ video = pipe(
159
  height=480,
160
  num_frames=161,
161
  num_inference_steps=50,
162
- decode_timestep=0.03,
163
- decode_noise_scale=0.025,
164
  ).frames[0]
165
  export_to_video(video, "output.mp4", fps=24)
166
  ```
 
13
  # LTX-Video Model Card
14
  This model card focuses on the model associated with the LTX-Video model, codebase available [here](https://github.com/Lightricks/LTX-Video).
15
 
16
+ LTX-Video is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 30 FPS videos at a 1216×704 resolution faster than they can be watched. Trained on a large-scale dataset of diverse videos, the model generates high-resolution videos with realistic and varied content.
17
  We provide a model for both text-to-video as well as image+text-to-video usecases
18
 
19
  <img src="./media/trailer.gif" alt="trailer" width="512">
 
26
  | ![example9](./media/ltx-video_example_00009.gif)<br><details style="max-width: 300px; margin: auto;"><summary>A man with graying hair, a beard, and a gray shirt...</summary>A man with graying hair, a beard, and a gray shirt looks down and to his right, then turns his head to the left. The camera angle is a close-up, focused on the man's face. The lighting is dim, with a greenish tint. The scene appears to be real-life footage. Step</details> | ![example10](./media/ltx-video_example_00010.gif)<br><details style="max-width: 300px; margin: auto;"><summary>A clear, turquoise river flows through a rocky canyon...</summary>A clear, turquoise river flows through a rocky canyon, cascading over a small waterfall and forming a pool of water at the bottom.The river is the main focus of the scene, with its clear water reflecting the surrounding trees and rocks. The canyon walls are steep and rocky, with some vegetation growing on them. The trees are mostly pine trees, with their green needles contrasting with the brown and gray rocks. The overall tone of the scene is one of peace and tranquility.</details> | ![example11](./media/ltx-video_example_00011.gif)<br><details style="max-width: 300px; margin: auto;"><summary>A man in a suit enters a room and speaks to two women...</summary>A man in a suit enters a room and speaks to two women sitting on a couch. The man, wearing a dark suit with a gold tie, enters the room from the left and walks towards the center of the frame. He has short gray hair, light skin, and a serious expression. He places his right hand on the back of a chair as he approaches the couch. Two women are seated on a light-colored couch in the background. The woman on the left wears a light blue sweater and has short blonde hair. The woman on the right wears a white sweater and has short blonde hair. The camera remains stationary, focusing on the man as he enters the room. The room is brightly lit, with warm tones reflecting off the walls and furniture. The scene appears to be from a film or television show.</details> | ![example12](./media/ltx-video_example_00012.gif)<br><details style="max-width: 300px; margin: auto;"><summary>The waves crash against the jagged rocks of the shoreline...</summary>The waves crash against the jagged rocks of the shoreline, sending spray high into the air.The rocks are a dark gray color, with sharp edges and deep crevices. The water is a clear blue-green, with white foam where the waves break against the rocks. The sky is a light gray, with a few white clouds dotting the horizon.</details> |
27
  | ![example13](./media/ltx-video_example_00013.gif)<br><details style="max-width: 300px; margin: auto;"><summary>The camera pans across a cityscape of tall buildings...</summary>The camera pans across a cityscape of tall buildings with a circular building in the center. The camera moves from left to right, showing the tops of the buildings and the circular building in the center. The buildings are various shades of gray and white, and the circular building has a green roof. The camera angle is high, looking down at the city. The lighting is bright, with the sun shining from the upper left, casting shadows from the buildings. The scene is computer-generated imagery.</details> | ![example14](./media/ltx-video_example_00014.gif)<br><details style="max-width: 300px; margin: auto;"><summary>A man walks towards a window, looks out, and then turns around...</summary>A man walks towards a window, looks out, and then turns around. He has short, dark hair, dark skin, and is wearing a brown coat over a red and gray scarf. He walks from left to right towards a window, his gaze fixed on something outside. The camera follows him from behind at a medium distance. The room is brightly lit, with white walls and a large window covered by a white curtain. As he approaches the window, he turns his head slightly to the left, then back to the right. He then turns his entire body to the right, facing the window. The camera remains stationary as he stands in front of the window. The scene is captured in real-life footage.</details> | ![example15](./media/ltx-video_example_00015.gif)<br><details style="max-width: 300px; margin: auto;"><summary>Two police officers in dark blue uniforms and matching hats...</summary>Two police officers in dark blue uniforms and matching hats enter a dimly lit room through a doorway on the left side of the frame. The first officer, with short brown hair and a mustache, steps inside first, followed by his partner, who has a shaved head and a goatee. Both officers have serious expressions and maintain a steady pace as they move deeper into the room. The camera remains stationary, capturing them from a slightly low angle as they enter. The room has exposed brick walls and a corrugated metal ceiling, with a barred window visible in the background. The lighting is low-key, casting shadows on the officers' faces and emphasizing the grim atmosphere. The scene appears to be from a film or television show.</details> | ![example16](./media/ltx-video_example_00016.gif)<br><details style="max-width: 300px; margin: auto;"><summary>A woman with short brown hair, wearing a maroon sleeveless top...</summary>A woman with short brown hair, wearing a maroon sleeveless top and a silver necklace, walks through a room while talking, then a woman with pink hair and a white shirt appears in the doorway and yells. The first woman walks from left to right, her expression serious; she has light skin and her eyebrows are slightly furrowed. The second woman stands in the doorway, her mouth open in a yell; she has light skin and her eyes are wide. The room is dimly lit, with a bookshelf visible in the background. The camera follows the first woman as she walks, then cuts to a close-up of the second woman's face. The scene is captured in real-life footage.</details> |
28
 
29
+ # Models
30
+
31
+ | Model | Version | Notes | inference.py config | ComfyUI workflow (Recommended) |
32
+ |--------------------|---------|---------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|------------------|
33
+ | ltxv-13b | 0.9.7 | Highest quality, requires more VRAM | [ltxv-13b-0.9.7-dev.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-13b-0.9.7-dev.yaml) | [ltxv-13b-i2v-base.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/ltxv-13b-i2v-base.json) |
34
+ | ltxv-13b-fp8 | 0.9.7 | Quantized model | Coming soon | [ltxv-13b-i2v-base-fp8.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/ltxv-13b-i2v-base-fp8.json) |
35
+ | ltxv-2b | 0.9.6 | Good quality, lower VRAM requirement than ltxv-13b | [ltxv-2b-0.9.6-dev.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-2b-0.9.6-dev.yaml) | [ltxvideo-i2v.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/low_level/ltxvideo-i2v.json) |
36
+ | ltxv-2b-distilled | 0.9.6 | 15× faster, real-time capable, fewer steps needed, no STG/CFG required | [ltxv-2b-0.9.6-distilled.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-2b-0.9.6-distilled.yaml) | [ltxvideo-i2v-distilled.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/low_level/ltxvideo-i2v-distilled.json) |
37
+
38
 
39
  ## Model Details
40
  - **Developed by:** Lightricks
 
46
 
47
  ### Direct use
48
  You can use the model for purposes under the license:
49
+ - 2B version 0.9: [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.license.txt)
50
+ - 2B version 0.9.1 [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.1.license.txt)
51
+ - 2B version 0.9.5 [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.5.license.txt)
52
+ - 2B version 0.9.6-dev [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltxv-2b-0.9.6-dev-04-25.license.txt)
53
+ - 2B version 0.9.6-distilled [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltxv-2b-0.9.6-distilled-04-25.license.txt)
54
+ - 13B version 0.9.7-dev [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltxv-13b-0.9.7-dev.license.txt)
55
+ - 13B version 0.9.7-dev-fp8 [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltxv-13b-0.9.7-dev-fp8.license.txt)
56
+ - Temporal upscaler version 0.9.7 [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltxv-temporal-upscaler-0.9.7.license.txt)
57
+ - Spatial upscaler version 0.9.7 [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltxv-spatial-upscaler-0.9.7.license.txt)
58
 
59
 
60
  ### General tips:
 
63
  * Prompts should be in English. The more elaborate the better. Good prompt looks like `The turquoise waves crash against the dark, jagged rocks of the shore, sending white foam spraying into the air. The scene is dominated by the stark contrast between the bright blue water and the dark, almost black rocks. The water is a clear, turquoise color, and the waves are capped with white foam. The rocks are dark and jagged, and they are covered in patches of green moss. The shore is lined with lush green vegetation, including trees and bushes. In the background, there are rolling hills covered in dense forest. The sky is cloudy, and the light is dim.`
64
 
65
  ### Online demo
66
+ The model is accessible right away via the following links:
67
+ - [LTX-Studio image-to-video](https://app.ltx.studio/ltx-video)
68
  - [Fal.ai text-to-video](https://fal.ai/models/fal-ai/ltx-video)
69
  - [Fal.ai image-to-video](https://fal.ai/models/fal-ai/ltx-video/image-to-video)
70
+ - [Replicate text-to-video and image-to-video](https://replicate.com/lightricks/ltx-video)
71
 
72
  ### ComfyUI
73
  To use our model with ComfyUI, please follow the instructions at a dedicated [ComfyUI repo](https://github.com/Lightricks/ComfyUI-LTXVideo/).
 
88
  python -m pip install -e .\[inference-script\]
89
  ```
90
 
 
 
 
 
 
 
 
 
 
91
  #### Inference
92
 
93
  To use our model, please follow the inference code in [inference.py](https://github.com/Lightricks/LTX-Video/blob/main/inference.py):
 
95
  ##### For text-to-video generation:
96
 
97
  ```bash
98
+ python inference.py --prompt "PROMPT" --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config ltxv-13b-0.9.7-dev.yaml
99
  ```
100
 
101
  ##### For image-to-video generation:
102
 
103
  ```bash
104
+ python inference.py --prompt "PROMPT" --input_image_path IMAGE_PATH --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config ltxv-13b-0.9.7-dev.yaml
105
  ```
106
 
107
  ### Diffusers 🧨
 
116
 
117
  Now, you can run the examples below:
118
 
119
+ ```py
 
 
120
  import torch
121
  from diffusers import LTXPipeline
122
  from diffusers.utils import export_to_video
123
 
124
+ pipe = LTXPipeline.from_pretrained("Lightricks/LTX-Video", torch_dtype=torch.bfloat16)
125
  pipe.to("cuda")
126
 
127
  prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
 
134
  height=480,
135
  num_frames=161,
136
  num_inference_steps=50,
 
 
137
  ).frames[0]
138
  export_to_video(video, "output.mp4", fps=24)
139
  ```
140
 
141
+ For image-to-video:
142
 
143
+ ```py
144
  import torch
145
  from diffusers import LTXImageToVideoPipeline
146
  from diffusers.utils import export_to_video, load_image
147
 
148
+ pipe = LTXImageToVideoPipeline.from_pretrained("Lightricks/LTX-Video", torch_dtype=torch.bfloat16)
149
  pipe.to("cuda")
150
 
151
  image = load_image(
 
162
  height=480,
163
  num_frames=161,
164
  num_inference_steps=50,
 
 
165
  ).frames[0]
166
  export_to_video(video, "output.mp4", fps=24)
167
  ```