File size: 2,615 Bytes
2576b8f 4c3d339 f9ad64d 7f06aca 14bd6e2 7f06aca 9c2636b 7f06aca cedb93a 9c2636b 7f06aca 9c2636b 7f06aca 9c2636b 7f06aca 9c2636b cedb93a 9c2636b 7f06aca 9c2636b 3f303e4 7f06aca 9c2636b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
---
license: agpl-3.0
datasets:
- nkp37/OpenVid-1M
- TempoFunk/webvid-10M
---
# AMD Hummingbird image-to-video Model
⚡️ In this work, we present **AMD Hummingbird-I2V**, a compact and efficient **diffusion-based** I2V model designed for high-quality video synthesis under limited computational budgets.
Hummingbird-I2V adopts a lightweight **U-Net** architecture with **0.9B parameters** and a novel two-stage training strategy guided by **reward-based feedback**, resulting in substantial improvements in inference speed, model efficiency, and visual quality.
To further improve output resolution with minimal overhead, we introduce a **super-resolution** module at the end of the pipeline. Additionally, we leverage **ReNeg**, an AMD proposed reward-guided framework for learning negative embeddings via gradient descent, to further boost visual quality.
As a result, Hummingbird-I2V can generate high-quality 4K video in just **11 seconds** with 16 inference steps on an AMD Radeon™ RX 7900 XTX GPU.
Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models.
We provide a detailed analysis of the model architecture, training methodology, and benchmark performance.
<div style="margin: 0; padding: 0;">
<img src="src/i2v_training_pipeline.png" alt="i2v_training_pipeline" title="i2v_training_pipeline" class="i2v_training_pipeline">
</div>
<div style="margin: 0; padding: 0;">
<table>
<tr>
<td><img src="src/01.gif"></td>
<td><img src="src/02.gif"></td>
<td><img src="src/03.gif"></td>
<td><img src="src/04.gif"></td>
</tr>
<tr>
<td><img src="src/05.gif"></td>
<td><img src="src/06.gif"></td>
<td><img src="src/07.gif"></td>
<td><img src="src/08.gif"></td>
</tr>
<tr>
<td><img src="src/09.gif"></td>
<td><img src="src/10.gif"></td>
<td><img src="src/11.gif"></td>
<td><img src="src/12.gif"></td>
</tr>
</table>
</div>
<style>
table {
width: auto;
border-collapse: collapse;
margin: 0 auto;
}
th, td {
border: 1px solid #ddd;
text-align: center;
padding: 0;
vertical-align: middle;
width: 256px;
}
img {
width: 384px;
height: 240px;
object-fit: cover;
margin: 0 !important;
padding: 0 !important;
display: block;
}
.i2v_training_pipeline {
width: 100%;
max-width: 1200px;
height: auto;
object-fit: contain;
margin: 0 auto;
}
</style>
|