amd
/

File size: 2,615 Bytes
2576b8f
 
 
 
 
4c3d339
f9ad64d
7f06aca
 
 
 
 
 
14bd6e2
7f06aca
 
 
9c2636b
7f06aca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cedb93a
9c2636b
 
 
 
7f06aca
9c2636b
 
 
 
7f06aca
9c2636b
7f06aca
9c2636b
 
cedb93a
 
9c2636b
7f06aca
 
 
9c2636b
3f303e4
7f06aca
 
 
 
 
9c2636b
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
license: agpl-3.0
datasets:
- nkp37/OpenVid-1M
- TempoFunk/webvid-10M
---
# AMD Hummingbird image-to-video Model
⚡️ In this work, we present **AMD Hummingbird-I2V**, a compact and efficient **diffusion-based** I2V model designed for high-quality video synthesis under limited computational budgets.  
Hummingbird-I2V adopts a lightweight **U-Net** architecture with **0.9B parameters** and a novel two-stage training strategy guided by **reward-based feedback**, resulting in substantial improvements in inference speed, model efficiency, and visual quality.  
To further improve output resolution with minimal overhead, we introduce a **super-resolution** module at the end of the pipeline. Additionally, we leverage **ReNeg**, an AMD proposed reward-guided framework for learning negative embeddings via gradient descent, to further boost visual quality.  
As a result, Hummingbird-I2V can generate high-quality 4K video in just **11 seconds** with 16 inference steps on an AMD Radeon™ RX 7900 XTX GPU.  
Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models.  
We provide a detailed analysis of the model architecture, training methodology, and benchmark performance.

<div style="margin: 0; padding: 0;">
  <img src="src/i2v_training_pipeline.png" alt="i2v_training_pipeline" title="i2v_training_pipeline" class="i2v_training_pipeline">
</div>

<div style="margin: 0; padding: 0;">
  <table>
    <tr>
      <td><img src="src/01.gif"></td>
      <td><img src="src/02.gif"></td>
      <td><img src="src/03.gif"></td>
      <td><img src="src/04.gif"></td>
    </tr>
    <tr>
      <td><img src="src/05.gif"></td>
      <td><img src="src/06.gif"></td>
      <td><img src="src/07.gif"></td>
      <td><img src="src/08.gif"></td>
    </tr>
    <tr>
      <td><img src="src/09.gif"></td>
      <td><img src="src/10.gif"></td>
      <td><img src="src/11.gif"></td>
      <td><img src="src/12.gif"></td>
    </tr>
  </table>
</div>

<style>
  table {
    width: auto;
    border-collapse: collapse;
    margin: 0 auto;
  }
  th, td {
    border: 1px solid #ddd;
    text-align: center;
    padding: 0;
    vertical-align: middle;
    width: 256px;
  }
  img {
    width: 384px;
    height: 240px;
    object-fit: cover;
    margin: 0 !important;
    padding: 0 !important;
    display: block;
  }
  .i2v_training_pipeline {
    width: 100%;
    max-width: 1200px;
    height: auto;
    object-fit: contain;
    margin: 0 auto;
  }
</style>