Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Lutalica's picture
4 2

Lutalica

Lutalica
21world's profile picture
·
https://github.com/RewindL
  • RewindL

AI & ML interests

Computer vision, Image Processing

Recent Activity

commented on a paper 11 days ago
One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient
upvoted a paper 3 months ago
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
commented on a paper 6 months ago
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
View all activity

Organizations

Sun Yat-Sen University's profile picture

commented a paper 11 days ago

One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient

Paper • 2509.26313 • Published 20 days ago • 4 •
4
commented 2 papers 6 months ago

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published Apr 29 • 96 •
15

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18 • 135 •
21
New activity in monology/pile-uncopyrighted about 1 year ago

Format issue when loading dataset

1
#1 opened almost 2 years ago by
antoine314
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs