Alchemist dataset
#41
by
cloverdale
- opened
New Alchemist dataset claims to boost aesthetic and seems to be a good choice in case lodestones (or anyone else) want to further finetune Chroma.
https://huggingface.co/datasets/yandex/alchemist
Preprint: https://arxiv.org/abs/2505.19297
"our initial stages focused on image quality. This involved: "
- Filtering for safety (NSFW removal)
The caption and image pairs seem to be very... inaccurate. My guess is that adding this dataset as-is would likely pollute the model, based on most samples I've checked out in it.
Here's an example prompt/image pair from it:
the white house is on fire, trump is on fire
yeah their prompts do not match the images at all, it's completely garbage
"a person climbing up a tree" then showing a dude just stood there. whoever wrote that dataset needs to redo everything.