view article Article *Context Is Gold to Find the Gold Passage*: Evaluating and Training Contextual Document Embeddings By manu and 1 other • 6 days ago • 23
NanoBEIR 🍺 Collection A collection of smaller versions of BEIR datasets with 50 queries and up to 10K documents each. • 13 items • Updated Sep 11, 2024 • 17
💜 Kotlin ML Pack Collection A collection of datasets, fine-tuned models and benchmarks to train your models for perfect Kotlin code generation. • 9 items • Updated Jun 11, 2024 • 23
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers Paper • 2503.00865 • Published Mar 2 • 65
GemmaX2 Collection GemmaX2 language models, including pretrained and instruction-tuned models of 2 sizes, including 2B, 9B. • 7 items • Updated Feb 7 • 22
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 401
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 148
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT Paper • 2402.07440 • Published Feb 12, 2024 • 1
DistilDIRE: A Small, Fast, Cheap and Lightweight Diffusion Synthesized Deepfake Detection Paper • 2406.00856 • Published Jun 2, 2024 • 12
NeMo Curator - Classifier Models Collection Classifier models that can be used in NeMo Curator for labelling/filtering datasets. • 11 items • Updated 1 day ago • 18
Jina CLIP: Your CLIP Model Is Also Your Text Retriever Paper • 2405.20204 • Published May 30, 2024 • 37
MMTEB Collection Our contribution to the Massive Multilingual Text Embedding Benchmark (MMTEB). Retrieval and reranking benchmarks in 16 languages. • 4 items • Updated Jun 6, 2024 • 3
Arctic-embed Collection A collection of text embedding models optimized for retrieval accuracy and efficiency • 8 items • Updated Dec 5, 2024 • 23
ColPali Models Collection Pre-trained checkpoints for the ColPali model. • 8 items • Updated Jan 23 • 5