|
--- |
|
license: agpl-3.0 |
|
datasets: |
|
- nkp37/OpenVid-1M |
|
- TempoFunk/webvid-10M |
|
--- |
|
# AMD Hummingbird image-to-video Model |
|
⚡️ In this work, we present **AMD Hummingbird-I2V**, a compact and efficient **diffusion-based** I2V model designed for high-quality video synthesis under limited computational budgets. |
|
Hummingbird-I2V adopts a lightweight **U-Net** architecture with **0.9B parameters** and a novel two-stage training strategy guided by **reward-based feedback**, resulting in substantial improvements in inference speed, model efficiency, and visual quality. |
|
To further improve output resolution with minimal overhead, we introduce a **super-resolution** module at the end of the pipeline. Additionally, we leverage **ReNeg**, an AMD proposed reward-guided framework for learning negative embeddings via gradient descent, to further boost visual quality. |
|
As a result, Hummingbird-I2V can generate high-quality 4K video in just **11 seconds** with 16 inference steps on an AMD Radeon™ RX 7900 XTX GPU. |
|
Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models. |
|
We provide a detailed analysis of the model architecture, training methodology, and benchmark performance. |
|
|
|
<div style="margin: 0; padding: 0;"> |
|
<img src="src/i2v_training_pipeline.png" alt="i2v_training_pipeline" title="i2v_training_pipeline" class="i2v_training_pipeline"> |
|
</div> |
|
|
|
<div style="margin: 0; padding: 0;"> |
|
<table> |
|
<tr> |
|
<td><img src="src/01.gif"></td> |
|
<td><img src="src/02.gif"></td> |
|
<td><img src="src/03.gif"></td> |
|
<td><img src="src/04.gif"></td> |
|
</tr> |
|
<tr> |
|
<td><img src="src/05.gif"></td> |
|
<td><img src="src/06.gif"></td> |
|
<td><img src="src/07.gif"></td> |
|
<td><img src="src/08.gif"></td> |
|
</tr> |
|
<tr> |
|
<td><img src="src/09.gif"></td> |
|
<td><img src="src/10.gif"></td> |
|
<td><img src="src/11.gif"></td> |
|
<td><img src="src/12.gif"></td> |
|
</tr> |
|
</table> |
|
</div> |
|
|
|
<style> |
|
table { |
|
width: auto; |
|
border-collapse: collapse; |
|
margin: 0 auto; |
|
} |
|
th, td { |
|
border: 1px solid #ddd; |
|
text-align: center; |
|
padding: 0; |
|
vertical-align: middle; |
|
width: 256px; |
|
} |
|
img { |
|
width: 384px; |
|
height: 240px; |
|
object-fit: cover; |
|
margin: 0 !important; |
|
padding: 0 !important; |
|
display: block; |
|
} |
|
.i2v_training_pipeline { |
|
width: 100%; |
|
max-width: 1200px; |
|
height: auto; |
|
object-fit: contain; |
|
margin: 0 auto; |
|
} |
|
</style> |
|
|
|
|
|
|