yang bai's picture

yang bai

byang

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization

liked a dataset 5 days ago

Mxode/Chinese-Instruct

liked a dataset 5 days ago

zai-org/LongAlign-10k

View all activity

Organizations

None yet

upvoted a paper 1 day ago

Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization

Paper • 2605.13641 • Published 3 days ago • 2

liked 8 datasets 5 days ago

Mxode/Chinese-Instruct

Viewer • Updated May 9, 2025 • 4.85M • 1.3k • 146

zai-org/LongAlign-10k

Viewer • Updated Feb 22, 2024 • 9.89k • 3.54k • 91

BAAI/IndustryCorpus2

Viewer • Updated Dec 17, 2024 • 826M • 2.62k • 69

GaryYang123/zh-meme-sft-8k

Viewer • Updated 26 days ago • 8.68k • 179 • 78

FreedomIntelligence/medical-o1-reasoning-SFT

Viewer • Updated Apr 22, 2025 • 90.1k • 6.51k • 1.1k

SylvanL/Traditional-Chinese-Medicine-Dataset-SFT

Viewer • Updated Nov 17, 2025 • 3.68M • 1.41k • 97

ShengbinYue/DISC-Law-SFT

Preview • Updated May 22, 2025 • 1.92k • 175

Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese

Viewer • Updated Aug 22, 2024 • 200k • 1.18k • 82

liked a dataset 23 days ago

nvidia/Nemotron-SFT-Safety-v1

Viewer • Updated Mar 11 • 45.1k • 16.2k • 12

liked 2 datasets 24 days ago

nvidia/Nemotron-Cascade-2-SFT-Data

Viewer • Updated Mar 19 • 15.9M • 11.9k • 63

stepfun-ai/Step-3.5-Flash-SFT

Viewer • Updated Mar 14 • 1.62M • 11.1k • 332

authored 8 papers 4 months ago

SparTerm: Learning Term-based Sparse Representation for Fast Text Retrieval

Paper • 2010.00768 • Published Oct 2, 2020

Length Desensitization in Direct Preference Optimization

Paper • 2409.06411 • Published Sep 10, 2024

Libra: Assessing and Improving Reward Model by Learning to Think

Paper • 2507.21645 • Published Jul 29, 2025 • 3

LongCat-Flash Technical Report

Paper • 2509.01322 • Published Sep 1, 2025 • 8

Making Mathematical Reasoning Adaptive

Paper • 2510.04617 • Published Oct 6, 2025 • 23

A Survey on LLM Mid-training

Paper • 2510.23081 • Published Oct 27, 2025 • 1

Efficient Context Scaling with LongCat ZigZag Attention

Paper • 2512.23966 • Published Dec 30, 2025 • 7

LongCat-Flash-Thinking-2601 Technical Report

Paper • 2601.16725 • Published Jan 23 • 180