Xiaoyang Cao

Sean13

https://xiaoyangcao1113.github.io/

AI & ML interests

RLFH, Deep Reinfrocement Learning

Recent Activity

upvoted a paper 15 days ago

RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

upvoted a paper 15 days ago

Cache-to-Cache: Direct Semantic Communication Between Large Language Models

upvoted a paper 16 days ago

Latent Collective Preference Optimization: A General Framework for Robust LLM Alignment

View all activity

Organizations

None yet

upvoted 2 papers 15 days ago

RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

Paper • 2510.06710 • Published 16 days ago • 36

Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Paper • 2510.03215 • Published 21 days ago • 92

upvoted a paper 16 days ago

Latent Collective Preference Optimization: A General Framework for Robust LLM Alignment

Paper • 2509.24159 • Published 26 days ago • 1

updated a model about 1 month ago

Sean13/llama-8b-instruct-rsimpo-full

Text Generation • 8B • Updated about 1 month ago • 4

published a model about 1 month ago

Sean13/llama-8b-instruct-rsimpo-full

Text Generation • 8B • Updated about 1 month ago • 4

updated a model about 1 month ago

Sean13/llama-8b-instruct-simpo-full

Text Generation • 8B • Updated about 1 month ago • 5

published a model about 1 month ago

Sean13/llama-8b-instruct-simpo-full

Text Generation • 8B • Updated about 1 month ago • 5

updated a model about 1 month ago

Sean13/llama-8b-instruct-ripo-full

Text Generation • 8B • Updated about 1 month ago • 7

published a model about 1 month ago

Sean13/llama-8b-instruct-ripo-full

Text Generation • 8B • Updated about 1 month ago • 7

updated a model about 1 month ago

Sean13/llama-8b-instruct-ipo-full

Text Generation • 8B • Updated about 1 month ago • 4

published a model about 1 month ago

Sean13/llama-8b-instruct-ipo-full

Text Generation • 8B • Updated about 1 month ago • 4

updated a model about 1 month ago

Sean13/llama-8b-instruct-rdpo-full

Text Generation • 8B • Updated Sep 23 • 5

published a model about 1 month ago

Sean13/llama-8b-instruct-rdpo-full

Text Generation • 8B • Updated Sep 23 • 5

updated a model about 1 month ago

Sean13/llama-8b-instruct-dpo-full

Text Generation • 8B • Updated Sep 23 • 3

published a model about 1 month ago

Sean13/llama-8b-instruct-dpo-full

Text Generation • 8B • Updated Sep 23 • 3

updated a model about 1 month ago

Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.99

Text Generation • 7B • Updated Sep 22 • 2

published a model about 1 month ago

Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.99

Text Generation • 7B • Updated Sep 22 • 2

updated a model about 1 month ago

Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.75

Text Generation • 7B • Updated Sep 22 • 4

published a model about 1 month ago

Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.75

Text Generation • 7B • Updated Sep 22 • 4

updated a model about 1 month ago

Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.55

Text Generation • 7B • Updated Sep 22 • 2

Xiaoyang Cao

AI & ML interests

Recent Activity

Organizations

Sean13's activity