RLHN: Cleaned Training Datasets with False Negatives Identified & Relabeled as ground truth.

RLHN
AI & ML interests
None defined yet.
Recent Activity
Organization Card
Welcome to RLHN
RLHN (ReLabeing Hard Negatives) uses a cascading LLM framework to identify and relabel false negatives in IR training datasets.
This repository contains training datasets curated by RLHN & models fine-tuned on these curated datasets.
List of Contributors:
- Nandan Thakur*
- Crystina Zhang*
- Xueguang Ma
- Jimmy Lin
Preprint URL: https://huggingface.co/papers/2505.16967
Citation
@misc{thakur2025rlhn,
title={Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval},
author={Nandan Thakur and Crystina Zhang and Xueguang Ma and Jimmy Lin},
year={2025},
eprint={2505.16967},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2505.16967},
}
Collections
4
models
31

rlhn/Qwen2.5-7B-hn-remove-400K
Updated

rlhn/Qwen2.5-7B-default-400K
Updated

rlhn/Qwen2.5-3B-rlhn-680K-reranker
Updated

rlhn/Qwen2.5-3B-hn-remove-680K-reranker
Updated

rlhn/Qwen2.5-3B-rlhn-400K-reranker
Updated

rlhn/Qwen2.5-3B-rlhn-100K-reranker
Updated

rlhn/Qwen2.5-3B-default-680K-reranker
Updated

rlhn/Qwen2.5-3B-default-400K-reranker
Updated

rlhn/Qwen2.5-3B-default-250K-reranker
Updated

rlhn/Qwen2.5-3B-default-100K-reranker
Updated
datasets
18
rlhn/remove-100K
Viewer
•
Updated
•
61k
•
29
rlhn/remove-250K
Viewer
•
Updated
•
151k
•
42
rlhn/remove-400K
Viewer
•
Updated
•
248k
•
50
rlhn/remove-680K
Viewer
•
Updated
•
324k
•
56
rlhn/hn-remove-250K
Viewer
•
Updated
•
247k
•
36
rlhn/hn-remove-100K
Viewer
•
Updated
•
93.3k
•
36
rlhn/hn-remove-680K
Viewer
•
Updated
•
649k
•
49
rlhn/hn-remove-400K
Viewer
•
Updated
•
389k
•
48
rlhn/rlhn-100K
Viewer
•
Updated
•
93.6k
•
61
•
1
rlhn/rlhn-250K
Viewer
•
Updated
•
248k
•
47
•
1