vitalyr

VitalyAnkh

AI & ML interests

None yet

Recent Activity

liked a model 7 days ago

Qwen/Qwen-Image

upvoted a paper about 2 months ago

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

liked a Space about 2 months ago

HuggingFaceTB/smol-training-playbook

View all activity

Organizations

None yet

upvoted a paper about 2 months ago

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

Paper • 2510.25602 • Published Oct 29 • 77

upvoted a paper 4 months ago

Towards a Unified View of Large Language Model Post-Training

Paper • 2509.04419 • Published Sep 4 • 75

upvoted a paper 10 months ago

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Paper • 2502.13128 • Published Feb 18 • 41

upvoted a collection 10 months ago

Deepseek Papers

Collection

Deepseek papers collection • 27 items • Updated 2 days ago • 289

upvoted an article 10 months ago

Article

You could have designed state of the art positional encoding

Nov 25, 2024

•

423

upvoted an article 11 months ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

•

887

upvoted a paper about 1 year ago

Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published Oct 1, 2024 • 151

upvoted 4 papers over 1 year ago

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22, 2024 • 126

Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12, 2024 • 36

Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Paper • 2404.08197 • Published Apr 12, 2024 • 29

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

Paper • 2404.07544 • Published Apr 11, 2024 • 20

upvoted an article over 1 year ago

Article

Mixture of Depth is Vibe

Apr 22, 2024

•

upvoted a paper over 1 year ago

RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9, 2024 • 39

vitalyr

AI & ML interests

Recent Activity

Organizations

vitalyr's activity

You could have designed state of the art positional encoding

Open-R1: a fully open reproduction of DeepSeek-R1

Mixture of Depth is Vibe