Alchemist dataset

#41
by cloverdale - opened

New Alchemist dataset claims to boost aesthetic and seems to be a good choice in case lodestones (or anyone else) want to further finetune Chroma.

https://huggingface.co/datasets/yandex/alchemist

Preprint: https://arxiv.org/abs/2505.19297

"our initial stages focused on image quality. This involved: "

- Filtering for safety (NSFW removal) 

The caption and image pairs seem to be very... inaccurate. My guess is that adding this dataset as-is would likely pollute the model, based on most samples I've checked out in it.

Here's an example prompt/image pair from it:

the white house is on fire, trump is on fire

image.png

yeah their prompts do not match the images at all, it's completely garbage
"a person climbing up a tree" then showing a dude just stood there. whoever wrote that dataset needs to redo everything.

Sign up or log in to comment