5 21 11

Marius Dinca

Puddings22

Puddings22

AI & ML interests

None yet

Recent Activity

upvoted a paper 9 days ago

Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality

upvoted a paper 9 days ago

On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking

upvoted a paper 9 days ago

2Mamba2Furious: Linear in Complexity, Competitive in Accuracy

View all activity

Organizations

None yet

upvoted 3 papers 9 days ago

Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality

Paper • 2602.14080 • Published 16 days ago • 20

On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking

Paper • 2602.16849 • Published 12 days ago • 6

2Mamba2Furious: Linear in Complexity, Competitive in Accuracy

Paper • 2602.17363 • Published 12 days ago • 7

upvoted 2 papers 12 days ago

Preliminary sonification of ENSO using traditional Javanese gamelan scales

Paper • 2602.14560 • Published 15 days ago • 1

On Surprising Effectiveness of Masking Updates in Adaptive Optimizers

Paper • 2602.15322 • Published 14 days ago • 9

commented a paper 15 days ago

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

Paper • 2602.11715 • Published 19 days ago • 5 •

upvoted a paper 15 days ago

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

Paper • 2602.11715 • Published 19 days ago • 5

commented a paper 17 days ago

Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm

Paper • 2602.11543 • Published 19 days ago • 5 •

upvoted 2 papers 17 days ago

Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm

Paper • 2602.11543 • Published 19 days ago • 5

LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation

Paper • 2602.11451 • Published 19 days ago • 15

commented a paper 20 days ago

NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models

Paper • 2602.06694 • Published 25 days ago • 15 •

upvoted a paper 20 days ago

NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models

Paper • 2602.06694 • Published 25 days ago • 15

commented a paper 23 days ago

SimpleGPT: Improving GPT via A Simple Normalization Strategy

Paper • 2602.01212 • Published 30 days ago • 3 •

upvoted 2 papers 23 days ago

SimpleGPT: Improving GPT via A Simple Normalization Strategy

Paper • 2602.01212 • Published 30 days ago • 3

FASA: Frequency-aware Sparse Attention

Paper • 2602.03152 • Published 28 days ago • 150

upvoted a paper 25 days ago

Shaping capabilities with token-level data filtering

Paper • 2601.21571 • Published Jan 29 • 27

upvoted 3 papers about 1 month ago

Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection

Paper • 2601.19375 • Published Jan 27 • 5

TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors

Paper • 2601.17958 • Published Jan 25 • 3

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Paper • 2601.18778 • Published Jan 26 • 40

updated a collection about 1 month ago

interesting

Collection

2 items • Updated Jan 27

Marius Dinca

AI & ML interests

Recent Activity

Organizations

Puddings22's activity