RepText / README.md

Update README.md

0af74a5 verified 5 days ago

6.38 kB

	---
	license: other
	license_name: flux-1-dev-non-commercial-license
	license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md

	language:
	- en
	library_name: diffusers
	pipeline_tag: text-to-image

	tags:
	- Text-to-Image
	- ControlNet
	- Diffusers
	- Flux.1-dev
	- image-generation
	- Stable Diffusion
	base_model: black-forest-labs/FLUX.1-dev
	---

	## RepText

	We present RepText, which aims to empower pre-trained monolingual text-to-image generation models with the ability to accurately render, or more precisely, replicate, multilingual visual text in user-specified fonts, without the need to really understand them. Specifically, we adopt the setting from ControlNet and additionally integrate language agnostic glyph and position of rendered text to enable generating harmonized visual text, allowing users to customize text content, font and position on their needs. To improve accuracy, a text perceptual loss is employed along with the diffusion loss. Furthermore, to stabilize rendering process, at the inference phase, we directly initialize with noisy glyph latent instead of random initialization, and adopt region masks to restrict the feature injection to only the text region to avoid distortion of the background. We conducted extensive experiments to verify the effectiveness of our RepText relative to existing works, our approach outperforms existing open-source methods and achieves comparable results to native multi-language closed-source models.

	<div align="center">
	<img src='assets/example1.png' width=1024>
	</div>

	## ⭐ Update
	- [2025/06/07] [Model Weights](https://huggingface.co/Shakker-Labs/RepText) and [inference code](https://github.com/Shakker-Labs/RepText) released!
	- [2025/04/28] [Technical Report](https://arxiv.org/abs/2504.19724) released!

	## Usage
	Please refer to [Github](https://github.com/Shakker-Labs/RepText) for details.

	```python
	import torch
	from controlnet_flux import FluxControlNetModel
	from pipeline_flux_controlnet import FluxControlNetPipeline

	from PIL import Image, ImageDraw, ImageFont
	import numpy as np
	import cv2
	import re
	import os

	def contains_chinese(text):
	if re.search(r'[\u4e00-\u9fff]', text):
	return True
	return False

	def canny(img):
	low_threshold = 50
	high_threshold = 100
	img = cv2.Canny(img, low_threshold, high_threshold)
	img = img[:, :, None]
	img = 255 - np.concatenate([img, img, img], axis=2)
	return img

	base_model = "black-forest-labs/FLUX.1-dev"
	controlnet_model = "Shakker-Labs/RepText"

	controlnet = FluxControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.bfloat16)
	pipe = FluxControlNetPipeline.from_pretrained(
	base_model, controlnet=controlnet, torch_dtype=torch.bfloat16
	).to("cuda")

	## set resolution
	width, height = 1024, 1024

	## set font
	font_path = "./assets/Arial_Unicode.ttf" # use your own font
	font_size = 80 # it is recommended to use a font size >= 60
	font = ImageFont.truetype(font_path, font_size)

	## set text content, position, color
	text_list = ["哩布哩布"]
	text_position_list = [(370, 200)]
	text_color_list = [(255, 255, 255)]

	## set controlnet conditions
	control_image_list = [] # canny list
	control_position_list = [] # position list
	control_mask_list = [] # regional mask list
	control_glyph_all = np.zeros([height, width, 3], dtype=np.uint8) # all glyphs

	## handle each line of text
	for text, text_position, text_color in zip(text_list, text_position_list, text_color_list):

	### glyph image, render text to black background
	control_image_glyph = Image.new("RGB", (width, height), (0, 0, 0))
	draw = ImageDraw.Draw(control_image_glyph)
	draw.text(text_position, text, font=font, fill=text_color)

	### get bbox
	bbox = draw.textbbox(text_position, text, font=font)

	### position condition
	control_position = np.zeros([height, width], dtype=np.uint8)
	control_position[bbox[1]:bbox[3], bbox[0]:bbox[2]] = 255
	control_position = Image.fromarray(control_position.astype(np.uint8))
	control_position_list.append(control_position)

	### regional mask
	control_mask_np = np.zeros([height, width], dtype=np.uint8)
	control_mask_np[bbox[1]-5:bbox[3]+5, bbox[0]-5:bbox[2]+5] = 255
	control_mask = Image.fromarray(control_mask_np.astype(np.uint8))
	control_mask_list.append(control_mask)

	### accumulate glyph
	control_glyph = np.array(control_image_glyph)
	control_glyph_all += control_glyph

	### canny condition
	control_image = canny(cv2.cvtColor(np.array(control_image_glyph), cv2.COLOR_RGB2BGR))
	control_image = Image.fromarray(cv2.cvtColor(control_image, cv2.COLOR_BGR2RGB))
	control_image_list.append(control_image)

	control_glyph_all = Image.fromarray(control_glyph_all.astype(np.uint8))
	control_glyph_all = control_glyph_all.convert("RGB")
	# control_glyph_all.save("./results/control_glyph.jpg")

	# it is recommended to use words such 'sign', 'billboard', 'banner' in your prompt
	# for Englith text, it helps if you add the text to the prompt
	prompt = "a street sign in city"
	for text in text_list:
	if not contains_chinese(text):
	prompt += f", '{text}'"
	prompt += ", filmfotos, film grain, reversal film photography" # optional
	print(prompt)

	generator = torch.Generator(device="cuda").manual_seed(42)

	image = pipe(
	prompt,
	control_image=control_image_list, # canny
	control_position=control_position_list, # position
	control_mask=control_mask_list, # regional mask
	control_glyph=control_glyph_all, # as init latent, optional, set to None if not used
	controlnet_conditioning_scale=1.0,
	controlnet_conditioning_step=30,
	width=width,
	height=height,
	num_inference_steps=30,
	guidance_scale=3.5,
	generator=generator,
	).images[0]

	if not os.path.exists("./results"):
	os.makedirs("./results")
	image.save(f"./results/result.jpg")
	```

	## 📑 Citation
	If you find RepText useful for your research and applications, please cite us using this BibTeX:
	```bibtex
	@article{wang2025reptext,
	title={RepText: Rendering Visual Text via Replicating},
	author={Wang, Haofan and Xu, Yujia and Li, Yimeng and Li, Junchen and Zhang, Chaowei and Wang, Jing and Yang, Kejia and Chen, Zhibo},
	journal={arXiv preprint arXiv:2504.19724},
	year={2025}
	}
	```

	## 📧 Contact
	If you have any questions, please feel free to reach us at `haofanwang.ai@gmail.com`.