In a Training Loop 🔄

5 26 12

Honglin Lin

LHL3341

https://lhl3341.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper about 15 hours ago

Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs

upvoted a paper 8 days ago

MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

authored a paper 13 days ago

Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training

View all activity

Organizations

upvoted a paper about 15 hours ago

Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs

Paper • 2604.10480 • Published 3 days ago • 14

upvoted a paper 8 days ago

MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

Paper • 2604.04771 • Published 9 days ago • 116

upvoted a paper 21 days ago

MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

Paper • 2603.22458 • Published 22 days ago • 135

upvoted a paper about 1 month ago

Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training

Paper • 2603.07223 • Published Mar 7 • 13

upvoted 2 collections 2 months ago

MMFineReason

Collection

Advancing Multimodal Reasoning via Data-centric Methods • 9 items • Updated 27 days ago • 1

MMFineReason

Collection

High-quality STEM reasoning dataset for Multimodal LLM post-training. • 8 items • Updated 14 days ago • 22

upvoted a paper 2 months ago

MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods

Paper • 2601.21821 • Published Jan 29 • 62

upvoted 3 papers 3 months ago

Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers

Paper • 2601.17367 • Published Jan 24 • 34

Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility

Paper • 2601.17027 • Published Jan 17 • 42

Closing the Data Loop: Using OpenDataArena to Engineer Superior Training Datasets

Paper • 2601.09733 • Published Dec 30, 2025 • 9

upvoted a paper 4 months ago

OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value

Paper • 2512.14051 • Published Dec 16, 2025 • 47

upvoted a paper 5 months ago

GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

Paper • 2511.11134 • Published Nov 14, 2025 • 33

upvoted a paper 6 months ago

Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning

Paper • 2510.04081 • Published Oct 5, 2025 • 23

upvoted a paper 9 months ago

Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning

Paper • 2507.17512 • Published Jul 23, 2025 • 37

upvoted 2 papers 12 months ago

A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis

Paper • 2504.12322 • Published Apr 11, 2025 • 28

FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

Paper • 2504.09925 • Published Apr 14, 2025 • 39

upvoted 4 papers about 1 year ago

GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation

Paper • 2504.02782 • Published Apr 3, 2025 • 57

LEMMA: Learning from Errors for MatheMatical Advancement in LLMs

Paper • 2503.17439 • Published Mar 21, 2025 • 15

MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion

Paper • 2503.16212 • Published Mar 20, 2025 • 25

MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer

Paper • 2503.14891 • Published Mar 19, 2025 • 22

Honglin Lin

AI & ML interests

Recent Activity

Organizations

LHL3341's activity