📢 NVIDIA Releases Nemotron-CC-Math Pre-Training Dataset: A High-Quality, Web-Scale Math Corpus for Pretraining Large Language Models 10 days ago • 2
NVIDIA Releases Improved Pretraining Dataset: Preserves High Value Math & Code, and Augments with Multi-Lingual 10 days ago • 2
NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks 17 days ago • 68
Llama-NeMoRetriever-ColEmbed: Developer-Focused Guide to NVIDIA's State-of-the-Art Text-Image Retrieval Jul 9 • 4
Nemotron-Personas: Improve AI Training With the First Synthetic Personas Dataset Aligned to Real-World Distributions Jun 10 • 16
nvidia/stt_en_fastconformer_hybrid_medium_streaming_80ms Automatic Speech Recognition • Updated Feb 18 • 1
nvidia/stt_en_fastconformer_hybrid_large_streaming_multi Automatic Speech Recognition • Updated Feb 18 • 482 • 12
nvidia/stt_en_fastconformer_hybrid_medium_streaming_80ms_pc Automatic Speech Recognition • Updated Feb 18 • 2
nvidia/stt_ka_fastconformer_hybrid_transducer_ctc_large_streaming_80ms_pc Automatic Speech Recognition • Updated Feb 18 • 61 • 2