amd
/

AMD-Hummingbird-I2V

Model card Files Files and versions

AMD-Hummingbird-I2V / README.md

hecui102's picture

Update README.md

7f06aca verified about 1 month ago

|

history blame contribute delete

2.62 kB

	---
	license: agpl-3.0
	datasets:
	- nkp37/OpenVid-1M
	- TempoFunk/webvid-10M
	---
	# AMD Hummingbird image-to-video Model
	⚡️ In this work, we present AMD Hummingbird-I2V, a compact and efficient diffusion-based I2V model designed for high-quality video synthesis under limited computational budgets.
	Hummingbird-I2V adopts a lightweight U-Net architecture with 0.9B parameters and a novel two-stage training strategy guided by reward-based feedback, resulting in substantial improvements in inference speed, model efficiency, and visual quality.
	To further improve output resolution with minimal overhead, we introduce a super-resolution module at the end of the pipeline. Additionally, we leverage ReNeg, an AMD proposed reward-guided framework for learning negative embeddings via gradient descent, to further boost visual quality.
	As a result, Hummingbird-I2V can generate high-quality 4K video in just 11 seconds with 16 inference steps on an AMD Radeon™ RX 7900 XTX GPU.
	Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models.
	We provide a detailed analysis of the model architecture, training methodology, and benchmark performance.

	<div style="margin: 0; padding: 0;">
	<img src="src/i2v_training_pipeline.png" alt="i2v_training_pipeline" title="i2v_training_pipeline" class="i2v_training_pipeline">
	</div>

	<div style="margin: 0; padding: 0;">
	<table>
	<tr>
	<td><img src="src/01.gif"></td>
	<td><img src="src/02.gif"></td>
	<td><img src="src/03.gif"></td>
	<td><img src="src/04.gif"></td>
	</tr>
	<tr>
	<td><img src="src/05.gif"></td>
	<td><img src="src/06.gif"></td>
	<td><img src="src/07.gif"></td>
	<td><img src="src/08.gif"></td>
	</tr>
	<tr>
	<td><img src="src/09.gif"></td>
	<td><img src="src/10.gif"></td>
	<td><img src="src/11.gif"></td>
	<td><img src="src/12.gif"></td>
	</tr>
	</table>
	</div>

	<style>
	table {
	width: auto;
	border-collapse: collapse;
	margin: 0 auto;
	}
	th, td {
	border: 1px solid #ddd;
	text-align: center;
	padding: 0;
	vertical-align: middle;
	width: 256px;
	}
	img {
	width: 384px;
	height: 240px;
	object-fit: cover;
	margin: 0 !important;
	padding: 0 !important;
	display: block;
	}
	.i2v_training_pipeline {
	width: 100%;
	max-width: 1200px;
	height: auto;
	object-fit: contain;
	margin: 0 auto;
	}
	</style>