Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
4
Xiaoyang Cao
Sean13
Follow
0 followers
·
2 following
https://xiaoyangcao1113.github.io/
XiaoyangCao1113
xiaoyangcao
AI & ML interests
RLFH, Deep Reinfrocement Learning
Recent Activity
upvoted
a
paper
17 days ago
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training
upvoted
a
paper
17 days ago
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
upvoted
a
paper
18 days ago
Latent Collective Preference Optimization: A General Framework for Robust LLM Alignment
View all activity
Organizations
None yet
Sean13
's models
28
Sort: Recently updated
Sean13/llama-8b-instruct-rsimpo-full
Text Generation
•
8B
•
Updated
Sep 24
•
4
Sean13/llama-8b-instruct-simpo-full
Text Generation
•
8B
•
Updated
Sep 24
•
5
Sean13/llama-8b-instruct-ripo-full
Text Generation
•
8B
•
Updated
Sep 24
•
7
Sean13/llama-8b-instruct-ipo-full
Text Generation
•
8B
•
Updated
Sep 23
•
4
Sean13/llama-8b-instruct-rdpo-full
Text Generation
•
8B
•
Updated
Sep 23
•
6
Sean13/llama-8b-instruct-dpo-full
Text Generation
•
8B
•
Updated
Sep 23
•
4
Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.99
Text Generation
•
7B
•
Updated
Sep 22
•
3
Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.75
Text Generation
•
7B
•
Updated
Sep 22
•
5
Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.55
Text Generation
•
7B
•
Updated
Sep 22
•
3
Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.5
Updated
Sep 22
Sean13/mistral-7b-instruct-v0.2-emdpo-full-alpha0.001
Updated
Sep 22
•
5
Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.001
7B
•
Updated
Sep 22
•
1
Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.01
7B
•
Updated
Sep 22
•
3
Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.5
7B
•
Updated
Sep 22
•
2
Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha1.0
7B
•
Updated
Sep 22
•
6
Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.9
7B
•
Updated
Sep 19
•
1
Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.7
7B
•
Updated
Sep 19
•
2
Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.3
Updated
Sep 19
Sean13/mistral-7b-instruct-v0.2-rcpo-full
Text Generation
•
7B
•
Updated
Sep 15
•
9
Sean13/mistral-7b-instruct-v0.2-cpo-full
Text Generation
•
7B
•
Updated
Sep 11
•
12
Sean13/mistral-7b-instruct-v0.2-simpo-full
Text Generation
•
7B
•
Updated
Sep 6
•
3
Sean13/mistral-7b-instruct-v0.2-rsimpo-full
Text Generation
•
7B
•
Updated
Sep 6
•
2
Sean13/mistral-7b-instruct-v0.2-ipo-full
Text Generation
•
7B
•
Updated
Aug 19
•
2
Sean13/mistral-7b-instruct-v0.2-slic_hf-full
Text Generation
•
7B
•
Updated
Aug 11
•
11
Sean13/mistral-7b-instruct-v0.2-rslic_hf-full
Updated
Aug 8
Sean13/mistral-7b-instruct-v0.2-ripo-full
Text Generation
•
7B
•
Updated
Aug 3
•
8
Sean13/mistral-7b-instruct-v0.2-emdpo-full
7B
•
Updated
Jul 24
•
3
Sean13/mistral-7b-instruct-v0.2-dpo-full
Text Generation
•
7B
•
Updated
Jul 20
•
6