Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.05379

Multimodal Reasoning

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17 • 8
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 40

papers about VLM reasoning

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10 • 43
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

Paper • 2503.17352 • Published Mar 21 • 24
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 161
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7 • 39

R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7 • 39

RuCCoD: Towards Automated ICD Coding in Russian

Paper • 2502.21263 • Published Feb 28 • 133
Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7 • 124
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7 • 46
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published Mar 7 • 27

RLMs (Reasoning Language Models)

LADDER: Self-Improving LLMs Through Recursive Problem Decomposition

Paper • 2503.00735 • Published Mar 2 • 23
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 114
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published Mar 7 • 27
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7 • 39

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Paper • 2503.07365 • Published Mar 10 • 62
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7 • 39

R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7 • 39

R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7 • 39
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published Mar 7 • 27

openai/whisper-large-v3-turbo

Automatic Speech Recognition • 0.8B • Updated Oct 4, 2024 • 3.31M • • 2.54k
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7 • 39
Running on Zero

1.63k

1.63k

Dia 1.6B

👯

Generate realistic dialogue from a script, using Dia!
lingtrain/chechen-russian

Viewer • Updated Jan 29, 2024 • 31.1k • 11 • 1

Multimodal Reasoning

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17 • 8
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 40

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

papers about VLM reasoning

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10 • 43
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

Paper • 2503.17352 • Published Mar 21 • 24
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 161
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7 • 39

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Paper • 2503.07365 • Published Mar 10 • 62
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7 • 39

R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7 • 39

R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7 • 39

RuCCoD: Towards Automated ICD Coding in Russian

Paper • 2502.21263 • Published Feb 28 • 133
Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7 • 124
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7 • 46
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published Mar 7 • 27

R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7 • 39
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published Mar 7 • 27

RLMs (Reasoning Language Models)

LADDER: Self-Improving LLMs Through Recursive Problem Decomposition

Paper • 2503.00735 • Published Mar 2 • 23
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 114
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published Mar 7 • 27
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7 • 39

openai/whisper-large-v3-turbo

Automatic Speech Recognition • 0.8B • Updated Oct 4, 2024 • 3.31M • • 2.54k
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7 • 39
Running on Zero

1.63k

1.63k

Dia 1.6B

👯

Generate realistic dialogue from a script, using Dia!
lingtrain/chechen-russian

Viewer • Updated Jan 29, 2024 • 31.1k • 11 • 1

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs