@article{zhou2025dialectgen,
  title={DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation},
  author={Zhou, Yu and An, Sohyun and Deng, Haikang and Yin, Da and Peng, Clark and Hsieh, Cho-Jui and Chang, Kai-Wei and Peng, Nanyun},
  journal={arXiv preprint arXiv:2510.14949},
  year={2025}
}

This is a text encoder of Stable Diffusion 1.5 trained with the Dialect Learning, Polysemy Control, and Image KL Regularization losses described in the paper. By replacing the original text encoder with this one, images can be generated using the fine-tuned model. This encoder substantially enhances dialect robustness across all five dialects (African American English, British English, Chicano English, Indian English, Singaporean English), achieving performance comparable to the base model's Standard American English (SAE) score while incurring negligible degradation on SAE MSCOCO and polysemy evaluations. Please refer to the code repository for more details.

Downloads last month: 7

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support