Lutalica's picture

4 2

Lutalica

Lutalica

·

https://github.com/RewindL

RewindL

AI & ML interests

Computer vision, Image Processing

Recent Activity

commented on a paper 10 days ago

One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient

upvoted a paper 3 months ago

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

commented on a paper 6 months ago

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

View all activity

Organizations

models 0

None public yet

datasets 0

None public yet