Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures
Abstract
Generative reward models with explicit reasoning chains outperform sequence-based reward models and zero-shot language models in preference learning for creative writing, indicating the need for intermediate reasoning in capturing subjective quality.
Current preference learning methods achieve high accuracy on standard benchmarks but exhibit significant performance degradation when objective quality signals are removed. We introduce WritingPreferenceBench, a dataset of 1,800 human-annotated preference pairs (1,200 English, 600 Chinese) across 8 creative writing genres, where responses are matched for objective correctness, factual accuracy, and length. On this benchmark, sequence-based reward models--the standard architecture for RLHF--achieve only 52.7% mean accuracy, while zero-shot language model judges perform at 53.9%. In contrast, generative reward models that produce explicit reasoning chains achieve 81.8% accuracy. We observe high within-model variance across genres: individual models range from 18.2% to 81.8% accuracy across different writing categories, with standard deviations averaging 10.1%. This variance persists regardless of model scale, with 27B parameter models showing no consistent improvement over 8B variants. Our results suggest that current RLHF methods primarily learn to detect objective errors rather than capture subjective quality preferences (e.g., creativity, stylistic flair, and emotional resonance), and that successful preference modeling may require intermediate reasoning representations rather than direct classification.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MENLO: From Preferences to Proficiency -- Evaluating and Modeling Native-like Quality Across 47 Languages (2025)
- From Scores to Preferences: Redefining MOS Benchmarking for Speech Quality Reward Modeling (2025)
- mR3: Multilingual Rubric-Agnostic Reward Reasoning Models (2025)
- RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models (2025)
- HUME: Measuring the Human-Model Performance Gap in Text Embedding Task (2025)
- Adaptive Originality Filtering: Rejection Based Prompting and RiddleScore for Culturally Grounded Multilingual Riddle Generation (2025)
- EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper