Xiaoyang Cao's picture

4

Xiaoyang Cao

Sean13

·

https://xiaoyangcao1113.github.io/

AI & ML interests

RLFH, Deep Reinfrocement Learning

Recent Activity

upvoted a paper 17 days ago

RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

upvoted a paper 17 days ago

Cache-to-Cache: Direct Semantic Communication Between Large Language Models

upvoted a paper 18 days ago

Latent Collective Preference Optimization: A General Framework for Robust LLM Alignment

View all activity

Organizations

None yet

Sean13 's models 28

Sean13/llama-8b-instruct-rsimpo-full

Text Generation • 8B • Updated Sep 24 • 4

Sean13/llama-8b-instruct-simpo-full

Text Generation • 8B • Updated Sep 24 • 5

Sean13/llama-8b-instruct-ripo-full

Text Generation • 8B • Updated Sep 24 • 7

Sean13/llama-8b-instruct-ipo-full

Text Generation • 8B • Updated Sep 23 • 4

Sean13/llama-8b-instruct-rdpo-full

Text Generation • 8B • Updated Sep 23 • 6

Sean13/llama-8b-instruct-dpo-full

Text Generation • 8B • Updated Sep 23 • 4

Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.99

Text Generation • 7B • Updated Sep 22 • 3

Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.75

Text Generation • 7B • Updated Sep 22 • 5

Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.55

Text Generation • 7B • Updated Sep 22 • 3

Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.5

Sean13/mistral-7b-instruct-v0.2-emdpo-full-alpha0.001

Updated Sep 22 • 5

Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.001

7B • Updated Sep 22 • 1

Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.01

7B • Updated Sep 22 • 3

Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.5

7B • Updated Sep 22 • 2

Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha1.0

7B • Updated Sep 22 • 6

Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.9

7B • Updated Sep 19 • 1

Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.7

7B • Updated Sep 19 • 2

Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.3

Sean13/mistral-7b-instruct-v0.2-rcpo-full

Text Generation • 7B • Updated Sep 15 • 9

Sean13/mistral-7b-instruct-v0.2-cpo-full

Text Generation • 7B • Updated Sep 11 • 12

Sean13/mistral-7b-instruct-v0.2-simpo-full

Text Generation • 7B • Updated Sep 6 • 3

Sean13/mistral-7b-instruct-v0.2-rsimpo-full

Text Generation • 7B • Updated Sep 6 • 2

Sean13/mistral-7b-instruct-v0.2-ipo-full

Text Generation • 7B • Updated Aug 19 • 2

Sean13/mistral-7b-instruct-v0.2-slic_hf-full

Text Generation • 7B • Updated Aug 11 • 11

Sean13/mistral-7b-instruct-v0.2-rslic_hf-full

Sean13/mistral-7b-instruct-v0.2-ripo-full

Text Generation • 7B • Updated Aug 3 • 8

Sean13/mistral-7b-instruct-v0.2-emdpo-full

7B • Updated Jul 24 • 3

Sean13/mistral-7b-instruct-v0.2-dpo-full

Text Generation • 7B • Updated Jul 20 • 6