archit's picture

Open to Work

archit

archit11

·

archit-spec

AI & ML interests

small language models

Recent Activity

published a bucket 17 days ago

updated a dataset about 2 months ago

archit11/assesment_embeddings_new

updated a model about 2 months ago

archit11/track_b_sft_merged

View all activity

Organizations

upvoted an article 4 months ago

Article

Controlling Language Model Generation with NVIDIA's LogitsProcessorZoo

Dec 23, 2024

•

51

upvoted an article 5 months ago

Article

Exploring Direct Tensor Manipulation in Language Models: A Case Study in Binary-Level Model Enhancement

Nov 7, 2025

•

4

upvoted 2 articles 8 months ago

Article

From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels

Aug 18, 2025

•

97

Article

How to Run a Hugging Face Model in JAX (Part 1)

Jul 20, 2025

•

31

upvoted a paper 9 months ago

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161

upvoted 3 articles 9 months ago

Article

You could have designed state of the art positional encoding

Nov 25, 2024

•

467

Article

Understanding Gemma 3n: How MatFormer Gives You Many Models in One

Jun 26, 2025

•

49

Article

G2P Shrinks Speech Models

Feb 5, 2025

•

94

upvoted 3 articles 10 months ago

Article

State of open video generation models in Diffusers

+1

Jan 27, 2025

•

67

Article

How Long Prompts Block Other Requests - Optimizing LLM Performance

Jun 12, 2025

•

11

Article

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

Apr 16, 2025

•

70

upvoted 2 papers 10 months ago

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 265

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2, 2025 • 190

upvoted 3 articles about 1 year ago

Article

Enabling Long Context Training with Sequence Parallelism in Axolotl

Apr 4, 2025

•

16

Article

SigLIP 2: A better multilingual vision language encoder

+1

Feb 21, 2025

•

210

Article

The case for specialized pre-training: ultra-fast foundation models for dedicated tasks

Aug 4, 2024

•

30

upvoted 3 collections about 1 year ago

Scotch & SOTA 🥃 Pt. 7: Human Feedback Datasets 🫣

The elusive “human” feedback • 1 item • Updated Sep 13, 2023 • 1

Scotch & SOTA 🥃 Pt. 6: Dialogue Tuning Datasets 💬

Conversations, turn-based dialog, and things that can be turned into that. • 4 items • Updated Sep 13, 2023 • 1

Scotch & SOTA 🥃 Pt. 5: Instruction Tuning Datasets 👩‍🏫

Question & answer, task completion, general SFT and otherwise finetuney data. • 6 items • Updated Mar 2 • 1

upvoted an article about 1 year ago

Article

How to deploy and fine-tune DeepSeek models on AWS

+1

Jan 30, 2025

•

55