Several open weight/open source models fine-tuned to elicit emergent misalignment (EM) using the datasets from Turner et al. and Wang et al.
Mogu Stew
xylqn7
AI & ML interests
None yet
Recent Activity
updated
a dataset
13 days ago
xylqn7/mazes-binary-classification-simplified
published
a dataset
13 days ago
xylqn7/mazes-binary-classification-simplified
updated
a dataset
24 days ago
xylqn7/mazes-binary-classification
Organizations
None yet