12 46 58

Tong Zhu

Spico

https://Spico197.github.io

AI & ML interests

Information Extraction, Mixture-of-Experts, LLM

Recent Activity

upvoted an article 13 days ago

Your MoE Model Does Not Have to Select Fixed Number of Experts

published an article 13 days ago

Your MoE Model Does Not Have to Select Fixed Number of Experts

upvoted an article 28 days ago

Transformers v5: Simple model definitions powering the AI ecosystem

View all activity

Organizations

upvoted an article 13 days ago

Article

Your MoE Model Does Not Have to Select Fixed Number of Experts

13 days ago

•

published an article 13 days ago

Article

Your MoE Model Does Not Have to Select Fixed Number of Experts

13 days ago

•

upvoted an article 28 days ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

Dec 1, 2025

•

305

upvoted a paper 28 days ago

P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads

Paper • 2602.09443 • Published 29 days ago • 57

upvoted a paper about 1 month ago

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

Paper • 2602.05885 • Published Feb 5 • 28

liked a dataset about 1 month ago

librarian-bots/paper-recommendations-v2

Viewer • Updated 18 days ago • 9.99k • 545 • 16

upvoted a paper about 1 month ago

AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

Paper • 2601.18631 • Published Jan 26 • 47

New activity in nvidia/Nemotron-Competitive-Programming-v1 about 1 month ago

User's content is empty in "competitive_coding_python"

#1 opened about 2 months ago by

uwesis

upvoted 3 papers about 2 months ago

MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models

Paper • 2601.11969 • Published Jan 17 • 27

Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey

Paper • 2601.11655 • Published Jan 15 • 61

Toward Efficient Agents: Memory, Tool learning, and Planning

Paper • 2601.14192 • Published Jan 20 • 56

upvoted an article about 2 months ago

Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

Mar 20, 2024

•

111

authored 7 papers 2 months ago

LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training

Paper • 2411.15708 • Published Nov 24, 2024

Iterative Value Function Optimization for Guided Decoding

Paper • 2503.02368 • Published Mar 4, 2025 • 15

Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts

Paper • 2503.05447 • Published Mar 7, 2025 • 8

Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models

Paper • 2503.16779 • Published Mar 21, 2025 • 1

Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts

Paper • 2406.11256 • Published Jun 17, 2024

Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

Paper • 2508.09834 • Published Aug 13, 2025 • 53

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Paper • 2512.24165 • Published Dec 30, 2025 • 51

upvoted a paper 2 months ago

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Paper • 2512.24165 • Published Dec 30, 2025 • 51

Tong Zhu

AI & ML interests

Recent Activity

Organizations

Spico's activity

Your MoE Model Does Not Have to Select Fixed Number of Experts

Your MoE Model Does Not Have to Select Fixed Number of Experts

Transformers v5: Simple model definitions powering the AI ecosystem

User's content is empty in "competitive_coding_python"

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models